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PREFACE 


In recent years, linear mathematical models have assumed an important 
role in almost all the physical and social sciences, and as was to be ex¬ 
pected, this development has stimulated a remarkable growth of interest 
in linear algebra. It is therefore surprising that the number and variety 
of volumes written on linear algebra seem not to have kept pace with the 
diversified needs of those in such fields as mathematics, engineering, 
economics, operations research, and business. This text, however, repre¬ 
sents an effort to meet the needs not only of those studying mathematics, 
but also of those working in the physical and social sciences. It is intended 
to be a reasonably rigorous, but not abstract, text. It was written with 
the intention that it could be read and used by those with a limited 
mathematical background. 

An attempt has been made to introduce new ideas slowly and carefully, 
and to give the reader a good intuitive "feeling” for the subject. The ab¬ 
stract axiomatic development, while it has many things to recommend it, 
did not seem appropriate here and was not used. Many numerical examples 
are given and, insofar as possible, each important ideals illustrated by an 
example, so that the reader who does not follow the theoretical develop¬ 
ment may assimilate the material by studying the example. 

For simplicity, all scalars in the text are assumed to be real numbers, 
although it is pointed out in appropriate places that the results hold for 
complex numbers as well. However, the author believes that students, 
especially engineers and physicists, should’ have the opportunity to solve 
problems involving complex numbers, and therefore such problems are 
included at the end of the appropriate chapters. It is interesting to note 
that in many mathematics texts which are more general in their presenta¬ 
tion and allow the scalars to be elements of a field, it is not possible to 
find a single problem requiring the use of complex numbers. 

Those who expect to make use of linear algebra must have some aware¬ 
ness of the problems involved in making numerical computations. Conse¬ 
quently, numerical techniques are discussed in somewhat greater detail 
than is usual. 

A novel feature of the text is the inclusion of a chapter covering certain 
topics in convex sets and n-dimensional geometry. It also contains an 
elementary discussion of some of the properties of sets of linear inequalities. 

The author considers the problems at the end of each chapter to be very 
important, and the reader should examine all and work a fair number of 
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them. They contain additional theoretical developments as well as routine 
exercises. 

This text might be used in a variety of ways: for a one-semester or one- 
quarter course in linear algebra; as a text or reference for part of a course 
in engineering mathematics, mathematical physics, or mathematical eco¬ 
nomics; as a supplementary text for courses in linear programming, 
quantum mechanics, classical mechanics; etc. 

The author is especially indebted to Professor J. H. Van Vleck, who 
first impressed upon him the importance of gaining a firm intuitive grasp 
of any technical subject. With respect to the text itself, the suggestions 
of Professors H. Houthakker, H. Wagner (especially his insistence on 
numerous examples), and of several other (unknown) reviewers were help¬ 
ful. Jackson E. Morris provided a number of the quotations which appear 
at the beginning of the chapters. The School of Industrial Management, 
Massachusetts Institute of Technology, very generously provided secre¬ 
tarial assistance for typing the manuscript. 


G. H. 
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CHAPTER 1 


INTRODUCTION 

.to pursue mathematical analysis while at the same 
time turning one’s back on its applications and on 
intuition is to condemn it to hopeless atrophy 

R. Courant. 

1-1 Linear models. A large part of the history of physical science is 
a record of the continuous human striving for a formulation of concepts 
which will permit description of the real world in mathematical terms. 
The more recent history of the social sciences (notably economics) also 
reveals a determined attempt to arrive at more quantitatively substan¬ 
tiated theories through the use of mathematics. To define mathematically 
some part of the real world, it is necessary to develop an appropriate 
mathematical model relating the one or more relevant variables. The 
purpose of the model might be, for example, to determine the distance of 
the earth from the sun as a function of time, or to relate the boiling point 
of water to the external pressure, or to determine the best way of blending 
raw refinery stocks to yield aviation and motor fuels. A model will con¬ 
sist of one or more equations or inequalities. These may involve only the 
variables, or they may involve variables and their derivatives (that is, 
differential equations), or values of variables at different discrete times 
(difference equations), or variables related in other ways (through in¬ 
tegral or integro-differential equations, for example). It is not necessarily 
true that the variables can be determined exactly by the model. They 
may be random variables and, in this case, only their probability distribu¬ 
tions can be found. 

No model is ever an exact image of the real world. Approximations are 
always necessary. In some cases, models of a high degree of accuracy can 
be developed so that the values obtained will be correct to ten or more 
decimal places. In other situations, the best models available may yield 
values which differ by more than 100% from the results of actual physical 
measurements. In fact, at times we expect a model to serve only one pur¬ 
pose, i.e., to predict, in a qualitative manner, the behavior of the variables. 
The accuracy required from a model depends upon the ultimate use for 
which it was devised. 

The real world can frequently be represented with sufficient accuracy 
by so-called linear models. Linearity is a very general concept: There are 
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linear equations in the variables, linear ordinary and partial differential 
equations, linear difference equations, linear integral equations, etc. All 
linear models have the properties of additivity and homogeneity. Addi¬ 
tivity means: If a variable x x produces an effect ot x when used alone, and a 
variable x 2 produces an effect a 2 when used alone, then X\, X 2 used together 
produce the effect a x + a 2 . Homogeneity implies that if a variable xi 
produces an effect a lf then for any real number X, Xx x produces an effect 
Xai. These remarks must be rather vague at present. The precise mathe¬ 
matical definition of linearity will be given later. 

From a mathematical point of view, linear models are of great ad¬ 
vantage. The mathematics of nonlinear systems almost always presents 
considerable difficulty to analytic treatment and usually requires the 
use of digital computers to obtain numerical solutions. In many cases, 
even large-scale digital computers are of no assistance. However, it is 
often relatively easy to work with linear models and to find analytic or 
numerical solutions for the quantities of interest. Both factors, ease of 
manipulation and sufficiently accurate approximation of the real world, 
have made linear models the most popular and useful tools in the physical 
and social sciences. 

1-2 Linear algebra. Almost all linear models lead to a set of simul¬ 
taneous linear equations or inequalities, although the original model may 
consist of a set of linear differential or difference equations. The variables 
in the set of simultaneous linear equations will not necessarily be the 
physical variables in the original model; however, they will be in some 
way related to them. We know from elementary algebra that a set of m 
simultaneous linear equations has the form 


011 X 1 +•••■ + a Xn x n — 7 * 1 , 

«21^1 + # * * + «2n^n = T 2l {1—1) 


“I - * * * “l - a wn x n — T m . 

The coefficients a t y are usually known constants. In some cases the r* 
are given constants, and the Xj, j = 1 , . . . , n, are the variables which 
must satisfy the equations (1-1). In other instances, both the r* and Xj 
are variables, and the r t * are related to the xy by ( 1 - 1 ). 

The general concept of linearity, introduced in the preceding section, 
can be made more concrete for a set of equations such as (1-1). The 
contribution of variable xy to is c^y = a^yXy, and the sum of the con¬ 
tributions of all the variables yields r*. If xy is changed to xy = Xxy, the 
contribution to r* of xy is a t yx = Xayyxy = Xayy. This equality expresses 
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the homogeneity property. If the contribution to r; of Xj were — 
a tj x|, the expression would not be homogeneous since the contribution of 
Xj — \Xj would be X 2 a*y. Linear models in a given set of variables cannot 
involve any powers of the variables other than the first; neither can they 
involve such functions as log xj or exp Xj. Referring again to Eq. (1—1), 
we note that the contribution to r; of xj and Xk when used together is 
dijXj + a ik x k = aij + oi ik , that is, the sum of the individual contributions 
of Xj and x k . This expression indicates the additivity property which is 
characteristic of linearity. This property rules out the possibility that 
products of the variables, such as XjX k) can appear in linear models since 
the contribution of x k depends on the value of Xj , and the combined con¬ 
tribution of Xj and x k is not the sum of the contributions of x jf x k when 
used separately. 

Linear algebra developed from studies of systems of linear equations, 
such as Eq. (1-1), as it became necessary to derive theoretical results for 
such systems and to invent simplified notations for their manipulation. 
Further generalizations have led to a branch of mathematics with wide 
applicability in the physical and social sciences. The techniques of linear 
algebra can be extended to all linear models; in addition, they also simplify 
operations with a wide variety of nonlinear models. The purpose of this 
text is to provide a succinct and comprehensive introduction to linear 
algebra which will enable the reader to use the subject in his own par¬ 
ticular field. No attempt will be made to illustrate in detail the usefulness 
of the subject for any particular field; however, we shall discuss occa¬ 
sionally applications of the techniques to specific models, such as, for 
example, linear programming. 

Before turning directly to the subject matter, we shall mention briefly 
several linear models and note that the theory of linear algebra can be 
used to great advantage in their analysis. We do not expect that the 
reader will understand all (or even any) of the models. They are presented 
only for the purpose of illustrating the range of problems to which linear 
algebra is applicable. 

1-3 Leontief’s interindustry model of an economy. In the early 1930’s, 
Professor Wassily Leontief of Harvard University developed an interest¬ 
ing linear model of the national economy. This model assumes that the 
economy consists of a number of interacting industries, each of which is 
imagined to produce only a single good and to use only one process of pro¬ 
duction. For example, steel manufacture and agriculture might be treated 
as industries. To produce its given good, each industry will have to pur¬ 
chase goods from other industries, for example: The automobile industry 
purchases steel from the steel industry and tires from the rubber industry. 
In addition to selling its good to other industries, a given industry will, 
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in general, be called upon to meet exogenous demands from consumers, 
the government, or foreign trade. 

Suppose that there are n different industries. Imagine also that, m a 
given year, each industry produces just enough to meet the exogenous 
demand and the demand of other industries for its good. Let Xi be the 
quantity of good i produced by industry i in a given year. Assume that 
in this year industry j will require units of good f, and that the exoge¬ 
nous demand for i will be b Thus, if industry i produces exactly enough 
to meet the demand, we obtain 

Xi — Vil + Vi 2 + * * * + yin + b{, ‘(1~2) 

Equation (1-2) allows for the possibility that industry i may use some of 
its own good. We obtain a balance equation (1-2) for each industry i. 

If industry j is going to produce xj units of good j, we have to know 
how many units of good i will be required. Clearly, the answer depends 
on the technology of the industry. Here, Leontief makes the important 
assumption that the amount of good i required to produce good j is 
directly proportional to the amount of good j produced, that is, 

V ij — O'ijXj, (1-3) 

where a ijy the constant of proportionality, depends on the technology of 
industry j. 

Substitution of (1-3) into (1-2) yields 

Xi CLi\X\ CLi 2 X 2 ’ ' 0>i n X n — hi 

for each 1 , and we obtain the following set of simultaneous linear equations: 
(1 — 01 i)xi — a 12 X 2 — * • * — a ln x n = b u 
—a 21 xi + (1 — 022)^2 — • • • — a 2n x n = b 2) /1 

(1-4) 

^n\X\ Rn2*^2 ’ * * (1 (Lnn)X n = b n . 

This is a set of n equations in n unknowns; intuitively, we suspect that 
we should be able to solve for a unique set of xj which will satisfy these 
equations. Thus, by specifying the exogenous demands b if we expect to 
determine how much each industry in the economy should produce in 
order to meet the exogenous demands plus the demands of other industries. 
Naturally, the relation between the Xj and the bi depends on the tech¬ 
nologies of the industries, that is, on the a;y. 

Leontief also shows that the same type of analysis can be used to de¬ 
termine the prices which should prevail in the hypothetical economy. The 
technological coefficient a*y can be viewed as the number of units of product 
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i required to produce one unit of product j. Let pj be the price of one 
unit of j. Then the cost of materials required to turn out one unit of j is 


dljPl + * * * + a njPn • 


The difference between the price of one unit of j and the cost of materials 
required to produce this one unit is called the value added by industry j 
and will be denoted by rj. Thus* 

w 

Vi — ^2 a i]Pi — T i< j — 

1=1 

The value added may include labor, profit, etc. Equation (1—5) represents 
a set of n simultaneous linear equations in the n prices pj f that is, 


(1 — 0,ll)Pl “ d2lP2 — • • • — dnlPn = T 1, 
— U 12 P 1 4“ (1 — ^ 22)^2 — • • * 0>n2Pn — r 2j 


— ttlnPl — 0>2nP2 — * * ' H" (1 &nn)Pn 

This set of equations looks quite similar to (1-4). The same coefficients 
Oij appear. However, now they appear in transposed order. Intuitively, 
it would seem that once the value added is specified for each product in 
the economy, the prices have been determined. 

We have presented the bare outlines of a static model of an economy 
(static because, for the period under consideration, nothing is allowed to 
change with time). The model is linear. It permits us to determine the 
quantities to be produced in terms of the exogenous demands, and the 
prices in terms of the values added. This model has been found useful in 
studies to determine whether the United States economy could meet 
certain wartime demands with a given labor supply, and in investigating 
the influence of a price change in one industry on prices in other industries. 
The basic Leontief model has been generalized in many different ways 
which we need not discuss here. 


* The upper-case Greek sigma (£) is a summation sign and, by definition, 

n n ^ 

^2 X < = *1 + *2 H-1- Xn, ^2x 2 i = x\-\ - b Xn, 

*== I » = 1 

w 

y! dijPi = aijPl +■••• + UnjVn. 
i =1 

This notation will be used frequently to simplify the writing of summations. 
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1-4 Linear programming. Linear programming, a linear model de¬ 
veloped within the last twelve years, has attracted wide attention. It 
has been applied to a wide variety of problems, for example: programming 
of petroleum refinery operations, determination of optimal feed mixes for 
cattle, assignment of jobs to machines in manufacturing, etc. We shall 
show how linear programming problems can arise in practice by studying 
a typical, albeit oversimplified, example. 

Let us consider a shop with three types of machines, A , B, and C, 
which can turn out four products, 1, 2, 3, 4. Any one of the products has 
to undergo some operation on each of the three machines (a lathe, drill, 
and milling machine, for example). We shall assume that the production 
is continuous, and that each product must first go on machine A, then B, 
and finally C. Furthermore, we shall assume that the time required for 
adjusting the setup of each machine to a different operation, when pro¬ 
duction shifts from one product to another, is negligible. Table 1-1 shows: 
(1) the machine hours for each machine per unit of each product; (2) the 
machine hours available per year; (3) the profit realized on the sale of one 
unit of any one of the products. It is assumed that the profit is directly 
proportional to the number of units sold; we wish to determine the yearly 
optimal output for each product in order to maximize profits. 


Table 1-1 



Examination of Table 1-1 shows that the item with the highest unit 
profit requires a considerable amount of time on machines A and C; the 
product with the second-best unit profit requires relatively little time on 
machine A and slightly less time on machine C than the item with the 
highest unit profit. The product with the lowest unit profit requires a 
considerable amount of time on machine B and relatively little time on C. 
This cursory examination indicates that the maximum profit will not be 
achieved by restricting production to a single product. It would seem that 
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at least two of them should be made. It is not too obvious, however, 
what the best product mix is. 

Suppose Xj is the number of units of product j produced per year. It 
is of interest to find the values of x x , x 2 , x 3 , x 4 which maximize the total 
profit. Since the available machine time is limited we cannot arbitrarily 
increase the output of any one product. Production must be allocated 
among products 1, 2, 3, 4 so that profits will be maximized without ex¬ 
ceeding the maximum number of machine hours available on any one 
machine. 

Let us first consider the restrictions imposed by the number of machine 
hours. Machine A is in use a total of 

1.5xx + x 2 + 2.4x 3 + x 4 hours per year, 

since 1.5 hours are required for each unit of product 1, and Xj units 
of product 1 are produced; and so on, for the remaining products. Hence 
the total time used is the sum of the times required to produce each 
product. The total amount of time used cannot be greater than 2000 hours. 
Mathematically, this means that 

1.5xi -j- x 2 T~ 2.4x 3 *-f- x 4 ^ 2000. (1—7) 

It would not be correct to set the total hours used equal to 2000 (for 
machine A) since there may not be any combination of production rates 
that would use each of the three machines to capacity. We do not wish 
to predict which machines will be used to capacity. Instead, we introduce 
a "less than or equal to” sign; the solution of the problem will indicate 
which machines will be used at full capacity. 

For machines B and C we can write 

xi + 5x 2 + x 3 + 3.5 x 4 < 8000 (machine B ), (1-8) 

1.5xi + 3x 2 + 3.5x 3 + x 4 < 5000 (machine C). (1-9) 

Since no more than the available machine time can be used, the variables 
Xj must satisfy the above three inequalities. Furthermore, we cannot 
produce negative quantities; that is, we have either a positive amount of 
any product or none at all. Thus the additional restrictions 

x\ >0, x 2 > 0, x 3 > 0, x 4 > 0 (1-10) 

require that the variables be non-negative. 

We have now determined all the restrictions on the variables. If xy 
units of product j are produced, the yearly profit z is 


z = 5.24xi -j- 7.30x2 -f - 8.34x 3 + 4.18x 4 . 


(1-H) 



8 


INTRODUCTION 


[chap. 1 


We wish to. find values of the variables which satisfy restrictions (1-7) 
through (1-10) and which maximize the profit (1-11). This is a linear 
programming problem. 

A general linear programming problem seeks to determine the values 
of r non-negative variables xj which will yield the largest value of z , 

z = CiXi H -+ c r x r , (1-12) 

for those sets of non-negative xj which satisfy a set of m linear inequalities 
or equations of the form 

o>i\X\ + • • • + a ir x r {< = >}&*, i = 1, . . • , m. (1-13) 

One and only one of the signs <, =, > holds for each constraint, but 
the sign can vary from one constraint to another. The value of m can be 
greater than, less than, or equal to, r. The linear function (1-12) is called 
the objective function ; the linear inequalities (1-13) are the constraints. 
A set of non-negative variables (x x , . . ., x r ) which satisfies (1-13) is a 
feasible solution ; an optimal feasible solution maximizes z in (1-12). A 
linear programming problem is solved when an optimal feasible solution 
to the problem has been found. 


1-5 Graphical solution of a linear programming problem in two vari¬ 
ables. Linear programming problems involving only two variables can 
be solved graphically. Consider, for example: 


3x x + 5x 2 < 15, 
5x x + 2x 2 < 10, 
xi, x 2 > 0, 
max z = 5x x + 3x 2 . 


(1-14) 


First, we shall find the sets of numbers (x x , x 2 ) which are feasible solu¬ 
tions to the problem. We introduce an x x x 2 -coordinate system and note 
that any set of numbers (x x , x 2 ) represents a point in the x x a: 2 -plane. All 
points (x x , x 2 ) lying on or to the right of the x 2 -axis have x x > 0. Sim¬ 
ilarly, all points lying on or above the x x -axis have x 2 > 0. Hence any 
point lying in the first quadrant has x x , x 2 > 0 and thus satisfies the non¬ 
negativity restrictions. Any point which is a feasible solution must lie 
in the first quadrant. 

To find the set of points satisfying the constraints, we must interpret 
geometrically such inequalities as 3x x + 5x 2 < 15. If the equal sign 
holds, then 3x x + 5x 2 = 15 is the equation for a straight line, and any 
point on this line satisfies the equation. Now consider the point (0, 0), 
that is, the origin. We observe that 3(0) + 5(0) = 0 < 15; hence the 
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origin also satisfies the inequality. In fact, any point lying on or below 
the line 3x x + 5x 2 = 15 satisfies 3x x + 5x 2 < 15. However, no point 
lying above the line satisfies the inequality. Therefore, the set of points 
satisfying the inequality 3x x + 5x 2 < 15 consists of all the points in the 
xjx 2 -plane lying on or below the line 3x x + 5x 2 = 15. However, not all 
these points satisfy the non-negativity restriction and this inequality; 
only the points in the first quadrant lying on or below the line 3xi + 
5x 2 = 15 fulfill both conditions. By analogy, all points in the first quad¬ 
rant lying on or below the line 5xi + 2x 2 = 10 satisfy the inequality 
5xi + 2x 2 < 10 and the restriction of non-negativity. 

The set of points satisfying both inequalities (3xi + 5x 2 < 15, 5xi + 
2x 2 < 10) and the non-negativity restriction is represented by the darkly 
shaded region in Fig. 1-1. Any point in this region is a feasible solution, 
and only the points in this region are feasible solutions. 

Nothing has been said so far about the objective function: To solve 
our problem, we must find the point or points in the region of feasible 
solutions which will yield the maximum value of the objective function. 
For any fixed value of z, z = 5x x + 3x 2 is a straight line. Any point on 
this line will give the same value of z. For each different value of z , we 
obtain a different line. It is important to note that all the lines represent¬ 
ing the different values of z are parallel since the slope of any line z = 
C\Xi + c 2 x 2 is —Ci/c 2f and hence is independent of z. In our problem, 
Ci, c 2 are fixed, and the lines are parallel. 

We wish to find the line of maximum z which has at least one point in 
common with the region of feasible solutions. The lines in Fig. 1-1 repre- 


X 2 



Figure 1-1 
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sent the objective function for three different values of z. Clearly, z 1 is 
not the maximum value of z: The line can be moved up (this increases z) 
and still have some points in the region of feasible solutions. Although 
23 > 2 2 and z 1 , the line representing z 3 has no point in common with the 
region of feasible solutions and thus does not satisfy our premise. Hence 
z 2 is the maximum value of z; the feasible solution which yields this value 
is the cornjer A of the region of feasible solutions. 

In Fig. 1-1, the approximate values of the variables for the optimal 
solution are X\ = 1, x 2 = 2.4. To find the exact values, we note that 
the point representing the optimal solution is the intersection of the lines 
3zi + bx 2 = 15, bxi + 2 x 2 = 10. Solving these two equations simul¬ 
taneously, we obtain X\ ~ 1.053, x 2 = 2.368. Substitution of these 
values into the objective function yields the maximum value of z — 12.37. 

In our example of a linear programming problem, several features have 
emerged which merit further discussion. First of all, there is an infinite 
number of feasible solutions which form a region in the xia; 2 -plane. This 
region has straight boundaries and some corners; geometrically speaking, 
it is a convex polygon. For any fixed value of z } the objective function 
is a straight line. The lines corresponding to different values of z are 
parallel. The maximum value of 2 is represented by the line with the 
largest value of z which has at least one point in common with the polygon 
of feasible solutions. The optimal solution occurred at a corner of the 
P°lyg° n - Interestingly enough, the same characteristics are found in 
general linear programming problems with a feasible solution (xi, ... 9 x r ) 
representing a point in an r-dimensional space. 

In 1947, George Dantzig developed an iterative algebraic technique 
(simplex method) which provides exact solutions to any linear program¬ 
ming problem (it is not an approximation method). Since a very consider¬ 
able number of arithmetical operations is required to find an optimal 
feasible solution to a problem involving many variables and constraints, 
large-scale digital computers are used to solve these problems. 

1-6 Regression analysis. It is often necessary (particularly in the 
fields of engineering and economics) to determine empirically the best 
formula relating one variable to another set of variables. Economists 
and engineers often start with the assumption that a linear relation exists 
between the “dependent” variable and the “independent” variables. For 
example, if y is the dependent variable and x lt ... ,x n are the inde¬ 
pendent variables, then a relation of the form 


y — a 1X1 + • * * + a n x n (1-15) 

might be used to explain y in terms of the Xj. The aj are assumed to be 
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constants; the best aj are to be determined from experimental or historical 
data. For example, we may wish to relate the demand for automobiles, y , 
in a given year to the national income for the previous year, x\, the change 
in national income over the last two years, , the demand for auto¬ 
mobiles in the past year, z 3 , etc. If, after determining the aj, comparison 
shows that the computed demand y (1-15) is a sufficiently close approxi¬ 
mation of the actual demand, then formula (1-15) may be used to predict 
from current data sales of automobiles for the coming year. 

We have not yet defined the meaning of the "best aj” in (1-15) or shown 
how these values are computed. Let us suppose that, from a series of 
experimental or historical data, we obtain k(k > n) sets of data which 
yield y for a given set of values of x lf . . . , x n . The tth set of data is de¬ 
noted by y(i), xi(i), . . . , x n {i). For any given set of aj in (1-15), let y(i) 
be the value of y computed by using x x (i), . . . , x n (i) in formula (1-15). 
The criterion most frequently used to determine the best set of aj requires 
that the sum of the squares of the differences between the measured 
values of y and the value computed for the same set of Xj by means of 
Eq. (1-15) be minimal; that is, we wish to find the aj which minimize z 
when 

^ = ± [»w - m 2 - (i-i6) 

i= 1 

But 


W) = Yj otjXj(i), 


(1-17) 


3—1 


and 


V(i) — oijXjii) 


j=i 


\y 2 (i) - 2 y(i) J2 otjXj{i) + 

*'=1 I 3=1 


Z) «i«i(»)j | ' 


(1-18) 


Thus, z has been expressed in terms of the aj and the given data, and hence 
the aj are the only variables. 

The set of aj which minimizes z in (1-18) must satisfy* the following 
set of n simultaneous linear equations: 


'Mniai — Vn 


i—l 


q = 1, . . . , n, 


(1-19) 


* The reader familiar with calculus will note that the necessary conditions 
to be satisfied by a set of aj which minimizes z are dz/daj = 0, j = 1 , ,n. 

These n partial derivatives yield the n linear equations (1-19). 
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where 

k k 

Vq ^ ('Oj 'Uqj — . (1—20) 

»—1 i—X 

Thus, to determine the ay, a set of linear equations must be solved. 

The material discussed in this section is part of statistical regression 
analysis and is used extensively in that branch of economics called econ¬ 
ometrics. In engineering, the technique for determining the best ay is 
often referred to as finding the best “least-squares” fit. The regression or 
least-squares techniques lead to a set of simultaneous linear equations 
to be solved for the best ay. Our starting point was a linear model, since 
we assumed that y could be related to the Xj by (1-15). 


1-7 Linear circuit theory. Linear circuit theory provides an excellent 
example of a linear model used in engineering. For d-c (direct-current) 
circuits, the fundamental linearity assumption is known as Ohm's law 
which states that the voltage drop E across a conductor is proportional 
to the current I flowing through the conductor. The constant of propor¬ 
tionality is called the resistance R. Resistance is a property of the material 
through which the current is passing. Thus E = IR. The equation 
indicates the homogeneity property of linearity. The additivity property 
is also present: If the same current passes through a resistance network 
of two resistors connected in series, the voltage drop across the network 
is the sum of the individual voltage drops across each resistor. 

Let us consider the d-c circuit in Fig. 1-2. Suppose that we wish to 
find the values of all the labeled currents. The symbol repre- 



Figure 1-2 
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sents a battery; if the current moves from the plus terminal of the battery 
to the minus terminal, a voltage drop of E occurs (any internal resistance 
of the battery is here neglected). The currents in each part of the circuit 
can be determined by two rules known as Kirchhoffs laws. These laws 
simply state that (1) the algebraic sum of the currents at any node (point 
A in Fig. 1-2) is zero, that is, there is conservation of current at a node; 
(2) the algebraic sum of the voltages (drops and rises) around a loop is 
zero. 

We wish to determine the six currents shown in Fig. 1—2. We shall 
assume arbitrarily that the currents flow in the direction of the arrows. 
If our choice of directions should prove incorrect, the solution will yield 
negative values for the current. Intuitively, we feel that six independent 
equations will be needed to determine the six currents. Let us obtain three 
equations using the three loops shown in Fig. 1-2. From loop 1 


E i — I3R3 + E$ I&R& I1R7 I1R1 
or (I" 21 ) 

I d R s + I q R 6 + h(Ri + R 7 ) = Ei + ^ 3 . 

Moving in a direction opposite to the assumed current flow, we obtain a 
voltage rise which is the negative of the voltage drop occurring in the 
direction of the current flow. For loop 2 


or 


for loop 3 

or 


— E2 + I2B2 + I4R4 E 3 T* L3.K3 0, 

I2R2 + I4R4 + /3B3 = ^2 + E3) 

E$ — I5R5 + I qRq — I4R4 = 

I5R5 — IqRq + I4R4 — E 5 . 


( 1 - 22 ) 


(1-23) 


Many other loops can be considered, such as a combination of 1 and 2, 
or 2 and 3, etc. However, these do not yield independent equations. All 
other loop equations can be obtained from (1-21) through (1-23) (prove!). 

The remaining three equations are found by means of the first Kirchhoff 
law concerning the conservation of flow at the nodes. Using nodes A , 
B , C y we obtain 

I\ + 1 2 — 1 3 = 0 (node A), (1-24) 

7 3 — / 4 — / 6 = 0 (node B), (1-25) 

I 5 + / 6 — 1 1 = 0 (nodeC). (1-26) 
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Equations (1-21) through (1-26) provide a set of six simultaneous linear 
equations to be solved for the currents. Here again a linear model leads 
to sets of linear equations. 

Linear circuit theory is also very important for the analysis of a-c (alter¬ 
nating current) circuits where the currents and voltages vary sinusoidally 
with time. Again, we obtain sets of simultaneous linear equations which, 
however, involve complex rather than real numbers. The theory of linear 
a-c circuits provides an excellent example of the usefulness of complex 
numbers in engineering. Although they are not introduced specifically 
in this book (except in some of the problems), the material developed 
applies to both complex and real numbers. 

1-8 Other linear models. Linear models and hence linear algebra are 
valuable tools in the solving of a wide range of problems. Linear algebra 
is used, for example, in: 

(1) Linear differential and difference equations to deal with sets of simul¬ 
taneous linear differential or difference equations; 

(2) Linear vibration theory to determine the characteristic frequencies of 
vibration (of atoms in molecules, for example). 

(3) Statistics , probability theory , and in related areas dealing with noise 
and stochastic processes (the theory of Markov processes in probability 
theory provides an excellent example); 

(4) Transformation theory in quantum mechanics to carry out trans¬ 
formations from one representation of an operator to another (here, com¬ 
plex numbers appear and have to be dealt with); 

(5) Establishing sufficient conditions in classical theory of maxima and 
minima (this theory finds important applications in economics, espe¬ 
cially in the classical theory of production and consumer behavior); 

(6) Rigid body mechanics to simplify the theoretical treatment of the 
motion of a rigid body, such as a gyroscope. 

This list and the examples given in the preceding sections do not ex¬ 
haust by any means the applicability of linear models; however, they 
furnish sufficient proof for the importance and usefulness of linear models 
and of linear algebra. 

1—9 The road ahead. The n variables in a set of simultaneous linear 
equations (1-1) can be thought of as a point in an n-dimensional space. 
In Chapter 2, a particularly useful n-dimensional space, the euclidean 
space, is introduced. The algebraic properties of points in this space 
are studied. Chapter 3 discusses matrices and determinants. Matrices 
enable us to deal efficiently with linear models from both a theoretical 
and practical standpoint, since they greatly simplify the mathematical 
manipulation of linear relations. 
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Chapter 4 continues the discussion of matrices and presents the notion 
of linear transformations. This concept supplies the key to the general 
meaning of linear models. The notion of rank is also introduced. Chapter 5 
deals with the theory of simultaneous linear equations. It develops the 
conditions which determine whether a set of equations has a solution, 
and whether this solution is unique. 

Chapter 6 discusses the geometry of n dimensions and introduces the 
notions of lines, planes, spheres, regions, etc. It outlines the theory of 
convex sets and its applicability to the study of many linear models of 
which linear programming is an outstanding example. The chapter con¬ 
cludes with a brief discussion of a particular class of convex sets known as 
convex cones. These are useful in examining generalized Leontief models 
in economics and in linear programming. 

Finally, Chapter 7 discusses the subject of characteristic values. This 
material facilitates greatly the study of linear vibrations, linear differential 
and difference equations, and Markov processes—to mention only a few 
applications. Nonlinear expressions, called quadratic forms, are also intro¬ 
duced which are important in developing sufficient conditions for maxima 
and minima of a function of a number of variables. The techniques of 
linear algebra provide a very powerful tool for analyzing quadratic forms. 


References 


Leontief models: 

(1) W. W. Leontief, The Structure of the American Economy, 1919-1939. 
2d ed., New York: Oxford, 1951. 

(2) R. Dorfman, P. Samuelson, and R. Solow, Linear Programming and 
Economic Analysis. New York: McGraw-Hill, 1958. 

(3) 0. Morgenstern, Ed., Economic Activity Analysis. New York: Wiley, 
1954. 

Linear programming: 

(1) S. Gass, Linear Programming: Methods and Applications. New York: 
McGraw-Hill, 1958. 

(2) G. Hadley, Linear Programming. Reading, Mass.: Addison-Wesley, 1961. 
Regression analysis: 

(1) D. Fraser, Statistics: An Introduction. New York: Wiley, 1958. This 
entry is only an example. Almost any statistics texts discusses this topic. 

(2) L. Klein, A Textbook of Econometrics. Evanston: Row, Peterson, 1953. 

Linear network analysis: 

(1) E. Guillemin, Introductory Circuit Theory. New York: Wiley, 1953. 

(2) P. LeCorbeiller, Matrix Analysis of Electrical Networks. New York: 
Wiley, 1950. 


16 


INTRODUCTION 


[CHAP. 1 


Linear differential equations: 

(1) W. Kaplan, Ordinary Differential Equations. Reading, Mass.: Addison- 
Wesley, 1958. 

Linear difference equations: 

(1) F. Hildebrand, Methods of Applied Mathematics. Englewood Cliffs: 
Prentice-Hall, 1952. 

Linear vibrations: 

(1) L. Brillouin, Wave Propagation in Periodic Structures. New York: 
Dover, 1953. 

(2) H. Corben and P. Stehle, Classical Mechanics. New York: Wiley, 1950. 
Markov processes: 

(1) J. Kemeny and J. Snell, Finite Markov Chains. Princeton: Van Nostrand, 
1960. 

Transformation theory in quantum mechanics: 

(1) J. Frenkel, Wave Mechanics f Advanced General Theory. New York: 
Dover, 1950. 

Maxima and minima with applications to economics: 

(1) P. Samuelson, Foundations of Economic Analysis. Cambridge: Harvard 
University Press, 1955. 

Rigid body mechanics: 

(1) H. Goldstein, Classical Mechanics. Reading, Mass.: Addison-Wesley, 
1951. 


Problems 

Discuss the physical or economic meaning of linearity for each of the four 
models presented in this chapter. 


CHAPTER 2 


VECTORS 

.. Arrows of outrageous fortune.” 

Shakespeare—Hamlet. 

2-1 Physical motivation for the vector concept. Vectors are frequently 
used in many branches of pure and applied mathematics and in the physi¬ 
cal and engineering sciences. The vector analysis applied by the physicist 
or engineer to problems in their fields differs in many ways from the 
n-dimensional vector spaces used in pure mathematics. Both types, 
however, have a common intuitive foundation. The need for a vector 
concept arose very naturally in mechanics. The force on a body, for 
example, cannot in general be completely described by a single number. 
Force has two properties, magnitude and direction, and therefore requires 
more than a single number for its description. Force is an example of a 
vector quantity. At the most elementary level in physics, a vector is de¬ 
fined as a quantity which has both magnitude and direction. A large 
number of physical quantities are vectors and, interestingly enough, the 
same laws of operation apply to all vectors. 

Vectors are often represented geometrically by a line with an arrowhead 
on the end of it. The length of the line indicates the magnitude of the vec¬ 
tor, and the arrow denotes its direction. When representing vectors 
geometrically as directed line segments, it is desirable to introduce a 
coordinate system as a reference for directions and as a scale for lengths. 
Familiar rectangular coordinates will be used. Usually, the coordinate 
axes will be named x lf x 2 , x 3 . Some vectors lying in a plane are shown 
in Fig. 2-1. 

We shall often represent a vector by a directed line segment. It should 
be stressed that a vector is not a number. If we are considering vectors 
lying in a plane, then two numbers are needed to describe any vector: one 
for its magnitude and another giving its direction (the angle it makes 
with one of the coordinate axes). If vectors in three-dimensional space 
are being studied, three numbers are needed to describe any vector: one 
number for its magnitude, and two numbers to denote its orientation with 
respect to some coordinate system. In physics, the symbol associated with 
a vector usually has a mnemonic connotation, such as f for force, a for 
acceleration, etc. Since we shall not be referring to any particular physical 
quantities, we shall simply use arbitrary lower-case boldface symbols for 
vectors, a, b, x, y. Boldface type implies that the symbol does not stand 
for a number. 
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In general, a vector may originate at any point in space and terminate 
at any point. Physical quantities such as force are, of course, independent 
of where one places the vector in a coordinate system. They depend only 
on the magnitude and direction of the vector. For this reason, it is con¬ 
venient to have a vector start always at the origin of the coordinate 
system (a in Fig. 2-1). In this book we shall adopt the convention that 
all vectors begin at the origin of the coordinate system rather than at some 
other point in space. This convention provides some simplifications in 
dealing with vectors. 

Consider the vector in Fig. 2-2. It will be observed that by specifying 
point (oi, a 2y a 3 ), that is, the point where the head of the vector terminates, 
we have completely characterized the vector. Its magnitude (in some 
physical units) is 

[a\ + a! + a§] 1/2 , 


and its direction is characterized by the two angles 0 and <t>, where 


tan 6 = 


a 2 

a i ’ 


cos <j> = 


a 3 

{a\ + ag + a|p/2 ' 


Thus, there is a one-to-one correspondence between all points in space 
and all vectors which emanate from the origin. For any given point 
(«i, a 2 , a 3 ), a corresponding unique vector a can be drawn from the origin 
to this point. Conversely, for any given vector a, there is a unique point 
(«i, a> 2 > a 3 ) which is the point where the vector terminates. Because of 
this correspondence between vectors and points we can write 


a — fai, a 2} a 3 ). 


( 2 - 1 ) 
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Equation (2-1) means that a is the vector drawn from the origin to the 
point (oi, a 2 , a 3 ) with coordinates «i, a 2 , a 3 . 

The correspondence between vectors and points in space is fundamental 
and important because of its practical value in vector computations; 
it also offers the key to more abstract generalizations of the vector concept. 

Examples: (1) a = (1, 2, 3) is the vector drawn from the origin to the 
point xi = 1, x 2 — 2, xz — 3. (2) a = (3, 4) is a vector lying in a plane 
drawn from the origin to the point X\ = 3, x 2 = 4. (3) a = (1, 0, 0) 

is the vector drawn along the a^-axis to Xi = 1, x 2 = 0, Xz = 0. 

2-2 Operations with vectors. We now shall consider the operations 
that can be performed with vectors. First, from an intuitive point of view, 
two vectors are equal if they have the same magnitude and direction. 
According to our point representation, this means that both vectors are 
drawn from the origin to the same point in space, that is, they are coin¬ 
cident. Thus, if 

a = fai, <* 2 , 03 )* b = (&l, &3)> 

then 

a = b, 

if and only if 

d\ — b i, a 2 = fr 2 , ciz — bz • (2—2) 

Equality of two vectors in space therefore implies that three equations in 
terms of real numbers must hold. 

The next operation with vectors, which arises naturally in physics, is 
that of changing the magnitude of a vector without changing its direction, 
for example: The force on a body is doubled without altering the direction 
of application. Figure 2-3 illustrates this point. It shows, in terms of 
the point notation, that if the magnitude of vector a = (ai, a 2 ) is changed 
by a multiple X, the new vector, Xa, is 

Xa = (Xai, Xa 2 ). 

To multiply the magnitude of a by X without changing its direction, each 


x 2 



Figure 2-3 
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coordinate has to be multiplied by X. The same procedure applies to 
three dimensions: 

Xa = (Xai, Xa 2 , Xa 3 ). (2-3) 

In the preceding section we have been assuming that X > 0. We can easily 
generalize the notation and allow X to be negative. If X <0, the magni¬ 
tude of the vector is changed by a factor —X, and its direction along its 
line of application is completely reversed. Frequently we shall refer to a 
real number as a scalar (in contradistinction to a vector or matrix). The 
operation of multiplying any given vector by a real number X will be 
referred to as multiplying a vector by a scalar. The rule for multiplying 
a vector by a scalar, when using point notation, is given by (2-3). 

In geometrical language, the magnitude of a vector a, which we shall 
denote by |a|, is often called its “length,” since the length of the line 
from the origin to the terminal point of the vector represents its magni¬ 
tude. By our definition, the magnitude of Xa is |Xa| = |X| |a|, where |X| 
is the absolute value of X. 

Examples: (1) If a = (2, 3), 6a = (12, 18). (2) If a = (1, -1), 

—4a = (—4, 4). Illustrate these graphically. 

Another very important operation is the addition of vectors. Again 
we turn to mechanics for our intuitive foundations. It is well known 
that if two forces act on a particle (a proton, for example) to produce 
some resultant motion, the same motion can be produced by applying a 
single force. This single force can, in a real sense, be considered to be the 
sum of the original two forces. The rule which we use in obtaining magni¬ 
tude and direction of a single force which replaces the original two forces 
is rather interesting: If a, b are the original forces, then the single force c, 
which we shall call the sum of a, b, is the diagonal of the parallelogram 
with sides a, b. This is illustrated in Fig. 2-4. The addition of vectors 
follows what, in elementary physics, is known as the parallelogram law . 

The rule for addition of vectors is very simple when point representa¬ 
tion is used. We merely add corresponding coordinates as shown in 
Fig. 2-4. If 

a — (ai, a 2 ), b = (b i, b 2 ) } 

then 

c = a + b = (a x + b u a 2 + b 2 ) = (c u c 2 ). 

Examination of the parallelogram law, when applied to three dimensions, 
shows that precisely the same sort of result holds. If 

a = («i, a 2 , a 3 ), b = (6i, b 2 , & 3 ), 
c = (ci, c 2 , c 3 ) = a + b = (ai + b\, a 2 + b 2) a 3 + & 3 ). (2-4) 
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To add three vectors, the first two are added to obtain the resultant, and 
the third is then added to the resultant of the first two. If 

a = (a ly ct 2y a 3 ), b = (b iy b 2 , 63 ), c = (c x , c 2 , c 3 ); 

d = a + b + c = (ai + b\ + Ci f a 2 + b 2 + c 2) a 3 + 63 + c 3 ) = (d\ y d 2y d 3 ). 

(2-5) 

In the same way any number of vectors can be added. Note that 

a + b = b + a. 

To subtract b from a, we add (— l)b to a, that is, 


a — b = a + (—l)b = (ai — b u a 2 — b 2y a 3 — 5 3 ). ( 2 - 6 ) 

The concept of addition of vectors brings up a new idea, that of the 
resolution of a vector into components. Consider the vectors 

a = (ai, a 2 ) y a x = (a u 0 ), a 2 = ( 0 , o 2 ). 

In this example, ai is a vector of length (magnitude) \ai\ lying along the 
rri-axis, and a 2 a vector of length |a 2 | lying along the x 2 -axis. It will be 
observed that 

a = ai + a 2 = (a u 0 ) + (0, a 2 ) = (a iy a 2 ). (2-7) 

The vectors ai, a 2 are called the vector components of a along the coordinate 
axes (see Fig. 2-5). 

The concept of resolving a vector into its vector components along the 
coordinate axes is a very useful one. However, it can be developed even 
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further. Applying the rule for multiplication by a scalar, we can write 


ai = («i, 0 ) = Oi(l, 0 ), a 2 = ( 0 , a 2 ) = a 2 (0, 1 ). 

Hence 

a = (a u a 2 ) = oi(l, 0 ) + o 2 ( 0 , 1 ). ( 2 - 8 ) 

The vector ( 1 , 0 ) lies along the zi-axis and has length one. The vector 
(0, 1) lies along the z 2 -axis and has length one. The vectors (1, 0) and 
( 0 , 1 ) are called unit vectors and will be denoted by ei, e 2 , respectively. 
The numbers a i} a 2 are called the components of a along the x i} x 2 -axes. 
They are not the vector components ai, a 2 ; the latter are obtained by 
multiplying the components by the corresponding unit vectors. Note that 
a if the component of a along the ith coordinate axis, is the ith coordinate 
of the point («i, a 2 ), i = 1, 2 . 

Similarly, working with three dimensions, we can write 

a = (ai, o 2 , a 3 ) — ai(l, 0 , 0 ) + a 2 ( 0 , 1 , 0 ) + 03 ( 0 , 0 , 1 ) 

= aiei + a 2 e 2 + 0363 . ^ ^ 


Equation (2-8) shows that any vector lying in a plane can be written as 
the sum of scalar multiples of the two unit vectors. In three dimensions, 
any vector can be written as the sum of scalar multiples of the three 
unit vectors: 

*i = (1, 0, 0), e 2 = (0, 1, 0), e 3 = (0, 0, 1). 

2-3 The scalar product. We shall now discuss vector multiplication. 
Let us assume that we have two vectors a, b, and that 6 is the angle be¬ 
tween these vectors, as shown in Fig. 2-6. Consider the expression 


|a| |b| cos 6 . (2-10) 

The number |b| cos 8 is, aside from sign, the magnitude of the perpen¬ 
dicular projection of b on the vector a. It is called the component of 





2 - 3 ] 


THE SCALAR PRODUCT 


23 


x 3 



Figure 2-7 

b along a. Thus, Eq. ( 2 - 10 ) represents the magnitude of a times the 
component of b along a. The number computed from ( 2 — 10 ) is called 
the scalar product of a and b, and we write 

ab' = |a| |b| cos 0, (2-11) 

where the symbol ab' denotes the scalar product of the two vectors. The 
scalar product of two vectors is a scalar, not a vector. This may seem to 
be a rather strange definition. However, it can be easily explained by an 
example taken from mechanics: 

The work done in moving an object along a straight path a distance r 
(distance is a vector since it has both magnitude and direction), using a 
constant force f, is given by the product of the magnitude of the distance 
and the component of the force along r, that is, by |r| |f| cos 0, if 0 is the 
angle between f and r. Therefore, the work w (a scalar) is the scalar prod¬ 
uct of f, r or w = fr 7 . 

The definition of a scalar product implies that ab' = ba'. Since the 
scalar product of two vectors is a scalar, it is not possible to form the 
scalar product of three or more vectors. 

Next, we shall transform the scalar product into an expression which is 
much more suitable for extension to higher dimensions. We can write 

a = aiei + a 2 e 2 + a 3 e 3 , b = &i e i + &2©2 + &3©3- 
Consider also the vector c = a — b: 

c = (ai — b\)ei + (a 2 — b 2 )e 2 + (a 3 — b 3 )e 3 . 

Then from Fig. 2-7 and the cosine law of trigonometry 
|c| 2 - |a| 2 + |b| 2 - 2|a| |b| cos 0. 
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Therefore 

But 


Hence 


ab' = *[|a| 2 + |b| 2 - |c| 2 ]. (2-12) 

E |b| 2 = E 6?, |C| 2 = E («.• - bi) 2 . 

1=1 1=1 1=1 


ab' — fli&i H - a 2 & 2 4 “ ^363. 


(2-13) 


Equation (2-13) is important since it shows that the scalar product of 
two vectors is computed by simply multiplying the corresponding com¬ 
ponents and adding the results. 

The unit vectors ei, e 2 , e 3 lie along the coordinate axes, which are 
assumed to be at right angles to each other. Hence 

eie' 2 = |ei| |e 2 | cos | = 0, ejei = |ei| |ei| cosO =? 1, 


or, in general, 

e*ej =0 (i t* j ), e*ei =1 (j = 1, 2, 3). (2-14) 

The scalar product of a vector with itself is the square of the length of 
the vector: 

aa' = a? + a 2 + = |a| 2 . (2—15) 

Examples: 

(1) If a = (2,3,1), b = (1,7,5): 
ab' = 2(1) + 3(7) + 1(5) = 28. 

(2) If a = (a u a 2 , a 3 ), b = (0,1, 0) = e 2 : 
ab' = ae' 2 = cti(0) -4- a 2 (l) -(- a 3 (0) = a 2 . 

In the preceding sections we have attempted to give an intuitive de¬ 
velopment of vector operations. Depending on one's background, this 
may or may not have seemed to be a natural development. In any event, 
it is desirable to have some familiarity with the material so that general¬ 
izations will not seem completely unmotivated. Many operations with 
vectors which are very important in physics and engineering have not been 
considered here. We have only discussed those parts which form a foun¬ 
dation for n-dimensional generalizations. Before going on to these gen¬ 
eralizations, let us note that vector analysis of the type used in physics 
and engineering goes back to Willard Gibbs (the founder of modem 
thermodynamics) and to the more clumsy quaternions used by Hamilton. 
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2-4 Generalization to higher dimensions. We have seen that at the 
most elementary level in physics, a vector can be described as a physical 
quantity possessing both magnitude and direction. The behavior of the 
real world determines the laws governing operations with these vectors. 
Next we observed that, instead of characterizing a vector by magnitude 
and direction, an equally satisfactory description could be achieved by 
the terminal point of a vector of proper magnitude and direction emanat¬ 
ing from the origin of the coordinate system. We then wrote a = 
(<*i, a 2 , a 3 ); the a* were called the components of the vector along the 
coordinate axes. The numbers a t * were also the coordinates of the point 
where the head of the vector terminated. From the laws for operating 
with vectors we derived rules applicable to operations with their com¬ 
ponents. 

Let us suppose that we would like to divorce the concept of a vector 
from any physical connotations. In addition, we shall assume that we 
would like to develop a sound mathematical theory of vectors. Naturally, 
we hope to arrive at some useful generalizations of the theory. First we 
must decide upon a suitable definition of a vector. From our new point 
of view it is not very satisfactory to define a vector as a quantity having 
magnitude and direction, since this definition does not provide a very 
concrete concept and does not allow for an immediately obvious algebraic 
expression. The concept is especially fuzzy if we contemplate extending 
it to spaces of dimension greater than three. However, such an extension 
of the theory is one of the first generalizations which will come to mind. 

The key to finding a definition which permits the development of a 
rigorous theory lies in focusing attention on the point representation of a 
vector. Our study of the intuitive foundations has shown us that a vector 
can be represented by an ordered array of numbers (ai, 02) or ( a i> a 2> a s) j 
hence, we may apply this concept when generalizing and simply define 
a vector as an ordered array of numbers (<* 1 , a 2 ) or (ai, a 2 , a 3 ). This 
is precisely what we shall do. From our new point of view, a vector 
will be nothing more or less than an ordered array of numbers; it will not 
have any physical meaning attached to it whatever. Once we take this 
step, it immediately becomes apparent that we are not limited to vectors 
containing at most three components. We may now generalize our notion 
of a vector to one containing any finite number of components, i.e., an 
ordered array of n numbers («i, a 2 , . . . , a n ), which we shall call an 
ordered n-tuple. Indeed, this generalization will be incorporated into the 
theory. 

Since we no longer plan to attach any physical meaning to vectors, we 
are free to define operations with these vectors in any way we choose. 
However, if the intuitive foundations of the subject are to be of any 
value, they should point the way to proper definitions. This is the pro- 
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cedure we shall follow. The definitions of operations with vectors in the 
generalized sense will be direct extensions of the results obtained from 
physical considerations. 

When generalizing the theory of vectors, it is of considerable advantage 
to retain some of the geometrical terms and concepts which were so 
helpful in the intuitive approach. In this way, we may give to many of 
the general results obtained a clear geometrical interpretation in two 
and three dimensions. It is very easy to relate our new definition of a 
vector to geometrical concepts. Just as (oi, a 2 ), (a u a 2 , a 3 ) can be con¬ 
sidered to be points in two- and three-dimensional spaces respectively, 
(ai, . . . , a») can be considered to be a point in an n-dimensional space, 
where the a t - are the “coordinates” of the point. As a matter of fact, from 
here on we shall assume that the concepts of point and vector mean pre¬ 
cisely the same thing. Our development will then proceed simultaneously 
along two paths, namely: (1) “rigorous” algebraic development; (2) geo¬ 
metric interpretation. At first the geometric interpretation will be some¬ 
what intuitive. However, before we are finished, the idea of an n-dimen¬ 
sional space will have been removed from the intuitive realm, and we shall 
have defined precisely what we mean by one important n-dimensional 
space, called euclidean space . A number of useful properties of this space 
will be developed. 

After this rather long introduction to the subject of generalizing the 
theory of vectors, we shall now proceed with the development of the 
theory. First, we shall repeat in a more formal way our new definition of 
a vector: 

Vector: An n-component vector * a is an ordered n-tuple of numbers 
written as a row («i, a 2 , .. ., a n ) or as a column 

"ar 

U 2 

_a n 

The ai y i — 1, . . . , n, are assumed to be real numbers and are called the 
components of the vector. 

Whether a vector is written as a row or a column is immaterial. The two 
representations are equivalent. Notational convenience determines which 
is most suitable. In the future we shall frequently use both row and column 


* Frequently an n-component vector is referred to as an n-dimensional vector. 
This seems to be a very appropriate terminology. However, some mathemati¬ 
cians object to attaching the term “dimension” to a single vector or point in 
space, since dimension is a property of space, not of a point. 
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vectors. However, the column representation will be used more frequently 
than the row representation, and for this reason we shall in the remainder 
of this chapter usually think of vectors as columns of numbers. Since 
it is rather clumsy to print these columns, a row with square brackets 
enclosing the n numbers will be used to represent a column vector: 


a 1 
0>2 

a n 


— [<Xl, . . . > O n ]. 


(2-16) 


Parentheses will enclose the n components of a row vector, that is, 
(ai, .. ., a n ) will always represent a row vector. The notational difference 
between row and column vectors should be carefully noted. 

The ordering of the numbers in the n-tuple which forms a vector is 
crucial. Different ordering represents different vectors. For example, in 
three dimensions, (1, 2, 3) and (3, 1, 2) are clearly not the same vector. 

As has been already indicated, there is a complete equivalence between 
n-component vectors and the points in an n-dimensional space. An n-tuple 
will be called a point or a vector depending on whether, at any particular 
moment, we view it from an algebraic or geometric standpoint.* Although, 
on several occasions, we have mentioned' a point in an n-dimensional 
space, it is expected that the reader has as yet only a vague intuitive idea 
of such a space. More precise concepts of an n-dimensional space, co¬ 
ordinates, etc., will be introduced shortly. 

Several useful vectors are often referred to by name, and hence we shall 
define them at the outset. We shall begin with an important set of vectors, 
the unit vectors. For vectors having n components there are n unit vectors. 
They are: 

ex = [1, 0,..., 0], e 2 = [0,1, 0,..., 0], ..., e„ = [0, 0,..., 1]. 

(2-17) 


It will be recalled that these vectors were introduced to advantage in our 
intuitive study of two and three dimensions. 

Unit vector: A unit vector , denoted by e*, is a vector with unity as the 
value of its ith component and with all other components zero. 


* When we speak of a as a vector, then the a,-, i — 1 , . . ., n of (2-16) are 
called the components of a. If we are thinking of a as a point in an n-dimensional 
space, then the a» are frequently called the coordinates of a. Consequently, 
component and coordinate are the respective algebraic and geometric names 
for the same element in the n-tuple (2-16). 
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The symbol e» will always be used for a unit vector. The number of com¬ 
ponents in the unit vector will be clear from the context. 

Null vector: The null vector, or zero vector, written 0, is a vector all of 
whose components are zero . 

0 = [ 0 , 0 ,..., 0 ]. ( 2 - 18 ) 

Sum vector: A sum vector is a vector having unity as a value for each 
component; it will he written 1. 

1 = (1, 1, , 1). (2-19) 

In general, the sum vector will be used as a row vector rather than a 
column vector. The reason for calling it a sum vector will become clear 
later. 


2-5 Generalized vector operations. Following the procedure suggested 
in the previous section, we shall define operations with vectors as straight¬ 
forward generalizations of the results obtained in three dimensions. The 
method for generalizing is: (1) the operations are written using the point 
representation, that is, in component form; (2) the generalization follows 
immediately. 

Equality: Two n-component vectors a, b are said to be equal, written 

a = b, if and only if all the corresponding components are equal . 

d% = hi, i = 1,.. . , n. (2-20) 

Equality of two n-component vectors means that, in terms of real 
numbers, there are n equations which must hold. Two vectors cannot be 
equal unless they have the same number of components. Note that if 
a = b, then b = a. 

Examples: 

(1) If a = [a u a 2 , a 3 , o 4 ] and b = [hi, h 2 , h 3 , & 4 ], then if a = b, 
a\ — h\, a 2 — h 2 , a 3 = h 3 , and o 4 = & 4 . 

(2) The vectors a = [0, 2, 1] and b = [0, 2, 2] are not equal since the 
third component of a is different from the third component of b. 

(3) If a = [2,1] and b = [2,1, 3], a ^ b since a, b do not have the 
same number of components. 

Occasionally we shall find use for vector inequalities. 

Inequalities: Given two n-component vectors a, b, then a > b means 

a% > h{, i = 1, . . . , n, and a < b means a* < hi, i = 1,. . ., n. 
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S imilar ly a > b means ai > hi for all i, and a < b means ai < bi for all i. 

Examples * 

(1) a = [0,1, 2], b = [-1, 0,1]; a > b since 0 > -1,1 > 0, 

2 > 1 . 

(2) a = [0,1, 0], b = [—1, 0,1]; a is not >b (written a £ b) 

since for the third components 0 < 1. 

Multiplication by a scalar: The product of a scalar X and a vector 

a = [ai, .. . , On], written Xa, is defined as the vector 

Xa = [Xoi-, X<z 2 , • • • y Xtt n ]* (2-21) 

If a > b and X > 0, then Xa > Xb. This follows immediately, since 
ai > bi, and if X > 0, then Xa t - > \b { . However, if a > b and X < 0, 
then Xa < Xb. This becomes equally clear if one considers the relations 
in terms of the vector components. Multiplying an inequality of real 
numbers by a negative number changes the direction of the inequality 
(5 > 1, -5 < -1). 

Examples: 

(1) If a - [3,4,5]; 3a = [9,12,15]. 

(2) If a = [3, 1, 2], b = [4, 2, 4]; a < b. Also -2a > -2b since 
-2a = [-6, -2, -4] > -2b = [-8, -4, -8]. 

Addition: The sum of two vectors a = [oi, ... j a n ] and b = [b i,.. . , &»], 

written a + b, is defined to be the vector 

a + b = [ai + bi, a 2 + b 2 , ...,<*» + &»]•. (2-22) 


This definition applies only to vectors having the same number of 
components. The sum of two n-component vectors is another n-component 
vector. 

Since addition is done by components, we immediately see that the 
addition of vectors possesses the commutative and associative properties 
of real numbers: 

a + b = b + a (commutative property), (2-23) 

a +(b + c)= (a + b)+c==a + b + c (associative property). 

(2-24) 

In the same way, we see that for scalars X: 

X(a + b) = Xa + Xb, (2-25) 

(Xi + X 2 )(a + b) = Xia + X x b + X 2 a + X 2 b. (2-26) 
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Subtraction: Subtraction of two vectors is defined in terms of operations 
already considered: 


a - b = a + (-l)b =[a l -b l ,...,a n ~ b n \ (2-27) 

The concepts of addition and multiplication by a scalar can be combined 
to yield a linear combination of vectors. 

Linear combination: Given m n-component vectors at, , a m , the 

n-component vector 

m 

a = ^2 A t a* = Aiai + • • • + Ama™ (2-28) 

»«*i 

is called a linear combination of at, . . ., a m for any scalars A t , 

i = 1, . . ., m. 

Examples: 

(1) a = [1,3,5], b= [2,4,6]; 

a + b = [(1 + 2), (3 + 4), (5 + 6)] = [3, 7, 11] 

(2) at = [2, 3, 4, 7], a 2 = [0, 0, 0, 1], a 3 = [1, 0, 1, 0]; 
at + 2a 2 + 3a 3 = [2, 3, 4, 7] + 2[0, 0, 0, 1] + 3[1, 0, 1, 0] = 

[5, 3, 7, 9] 

(3) a - a = 0, a + 0 = a; (2-29) 

a + a implies a,- + a,* = 2 a*; thus a + a = 2a. 

2-6 Euclidean space and the scalar product. Using the operations 
defined in the foregoing section, we see that any n-component vector can 
be written as a linear combination of the n unit vectors (2-17): 

a = [at, . . . , a n ] = a^x H-f- a„e n . (2-30) 

This is a straightforward generalization from two and three to higher 
dimensions. It will be recalled that, in operations with two and three 
dimensions, the unit vectors lay along the coordinate axes. We have, 
as yet, no coordinate system in our n-dimensional space. However, the 
preceding discussion suggests that we use unit vectors to define a coordinate 
system . If we imagine that a coordinate axis is "drawn” along each unit 
vector, we shall obtain a coordinate system for n-dimensional space. 
Furthermore, the zth component of a is the component of a along the tth 
coordinate axis (the same was true in three dimensions). The null vector 0 
is the origin of our coordinate system. Whenever we write a vector a 
as [oi, .. ., a n ], we automatically have a special coordinate system in 
mind, that is, the coordinate system defined by n unit vectors. 
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To complete the definition of our n-dimensional space by analogy to the 
properties of three-dimensional space, we must introduce the notion of 
distance and the related notions of lengths and angles. Distance is a very 
important concept since it forms the basis for notions of continuity and 
hence for analysis. A convenient way of introducing distance, lengths, 
and angles is to define the scalar product of two vectors. 

Scalar product: The scalar product of two n-component vectors a, b is 
defined to he the scalar 

n 

a 1&1 + aj )2 + * * * + a>nh n = ^2 aifri- (2-31) 

t=i 

It will be helpful to use different notations for the scalar product, de¬ 
pending on whether a, b are row or column vectors.* If a, b are both 
column vectors, we denote the scalar product by a'b. If a, b are both 
row vectors, the scalar product will be denoted by ab'. When a is a row 
vector and b is a column vector, the scalar product will be written ab. 
For the present, we assume that both vectors are column vectors; hence 
the scalar product can be written 


n 



a'b = ^2 aibi. 

i=l 


(2-32) 

We see immediately that 





• a'b = b'a, 


(2-33) 

a'(b + c) = a'b + a'c, (a + b)'c 

= a'c + b'c, 

(2-34) 

a'(Xb) = X(a'b), 

(Xa')b = X(a'b); 

X any scalar. 

(2-35) 


Distance: The distance from the vector (point) a to the vector (point) b, 
written |a — b|, is defined as 


|a - b| 


[(a - b)'(a - b)] 1/2 = 




(2-36) 


Distance, as defined by (2-36), has the following properties: 


(1) 

|a — b| > 0 unless a — b — 0 

> 

(2-37) 

(2) 

|a — b| = |b — a|; 


(2-38) 

(3) 

|a — b[ -f-* |b — c[ > |a — c| 

(triangle inequality). 

(2-39) 


* Our notation for scalar products differs from others that are often used, 
such as a • b and (a, b). We have chosen our system in order to achieve con¬ 
sistency with the matrix notation introduced later. 
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These algebraic expressions mean: (1) the distance between two different 
points is always positive; (2) the distance from a to b is the same as the 
distance from b to a; (3) in two and three dimensional spaces, the sum of 
the lengths of two sides of a triangle is not less than the length of the 
third side. The proofs for (2-37) through (2-39), based on definition (2-36), 
are to be supplied in Problems 2-24 through 2-26. 

It is important to note that we did not have to define distance by 
(2-36). There are many spaces and geometries in which distance is de¬ 
fined differently. Indeed, there are even geometries in which distance is 
not defined at all. For our purposes, and for many others, definition (2-36) 
is most suitable. 


Length: The length or magnitude of a vector a, written |a|, is defined as 

I 1/2 


- [.'.]■» - 


2 >? 

1=1 


(2-40) 


Note that length is a special case of distance since |a| = |a — 0|. 
The length of a is the distance from the origin to a. 

The preceding definitions of scalar product, distance, and length are 
direct generalizations of the corresponding expressions for three dimen¬ 
sions. The appropriate definitions are fairly obvious. It is not quite 
so clear, however, how the angle between two vectors should be general¬ 
ized. Here, it is important to note that, .in the intuitive introduction, 
the angle between the two vectors was part of the definition of a scalar 
product. 

In this section we have defined the scalar product without reference to 
angles. We shall now use the original definition of the scalar product to 
define the angle between two vectors: 

Angle: The angle 6 between two vectors a = [a\, ... y a n ] and b = 
where a, b ^ 0, is computed from 


cos 0 — 


a'b = E?= i ajbi 

W l b l E“-i o<] 1 / 2 E"=i & 2 ] 1 ' 2 


(2-41) 


Note: The cosine of the angle between two vectors appears in statistics. 
If we have n sets of historic or experimental data (y i, Xi), . .. , ( y n , £n), 
and if we write 


y= lyi — V, ■ • ■ >Vn — Vi 


X — [iTj • • ■ f Xji 


1 n 1 n 
= £ v<’ w =«£ **•' 


i= 1 


where 
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then the cosine of the angle between y and x is the correlation coefficient 
for the sets of data. 

It can be easily shown that cos 0, as defined by (2-41), satisfies —1 < 
cos 0 < 1; therefore, we can always compute an angle 0 lying between 
0 and 7r. To establish that —1 < cos 0 < 1, is equivalent to showing 
that for any n-component vectors a, b 

|a'b| < |a| |b|. (2-42) 

This property is called the Schwarz inequality. Problem 2-25 requires 
proof of (2-42). 

In two and three dimensions, two non-null vectors are called orthogonal 
if the angle between them is i r/2. In this case, the scalar product of the 
two vectors vanishes [see (2-11) and note that cos 0 = 0]. We can gen¬ 
eralize the concept of orthogonality to an n-dimensional space: 

Orthogonality: Two vectors a, b (a, b ^ 0 ) are said to be orthogonal if 
their scalar product vanishes , that is , a'b — 0 . 

From (2-41) we see that if two vectors are orthogonal,* cos 0 = 0, 
and the angle between the vectors is t/2. We notice immediately that the 
unit vectors are orthogonal since e'ey = 0, iV j. This means that 
the coordinate system in n dimensions defined by the unit vectors is an 
orthogonal coordinate system analogous to the orthogonal coordinate sys¬ 
tems for two and three dimensions. 

Finally, we can define an n-dimensional space, often referred to as 
euclidean space: 

Euclidean space: An n-dimensional euclidean space (or euclidean vector 
space) is defined as the collection of all vectors {points) a = [ai, . . . , a n ]. 
For these vectors , addition and multiplication by a scalar are defined by 
(2-22) and (2-21), respectively. Furthermore , associated with any two 
vectors in the collection is a non-negative number called the distance be¬ 
tween the two vectors; the distance is given by (2-36). 

The n-dimensional spaces which we shall discuss here will always be 
euclidean, represented by the symbol E n . Ordinary two- and three- 
dimensional spaces of the types considered in the intuitive introduction 
are euclidean spaces. When n = 3, our n-dimensional space reduces to 
the familiar concept of a three-dimensional space. 

The definition of a euclidean space encompasses definitions for operating 
with points in the space and for distance. Equation (2-41) provides the 


* We can also say that 0 is orthogonal to every other vector, and that the 
angle between 0 and any other vector is t/2. This will remove the restriction 
a, b 5 ^ 0 in the definition of orthogonality. 
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definition of the angle between two points or vectors. Furthermore, we 
have seen that the unit vectors can be used to define an orthogonal co¬ 
ordinate system in this space. At this juncture, we could proceed to intro¬ 
duce the notions of lines and planes in E n , etc. However, we shall defer 
the study of certain aspects of n-dimensional geometry to Chapter 6 and 
devote the remainder of the present chapter to a discussion of some 
properties of points or vectors in E n . 

Before passing on to the subject of linear dependence, let us pause and 
illustrate a use for the sum vector defined by (2-19). If we form the 
scalar product of the sum vector and any other vector a, we obtain 

n 

la = Y, (2-43) 

1=1 

This scalar product is the sum of the components of a. Any summation 
can be written as the scalar product of the sum vector and the vector 
formed from the elements in the summation. The reason for calling 1 
the sum vector is now clear. 

2-7 Linear dependence. If one vector in a set of vectors from E n can 
be written as a linear combination of some of the other vectors in the set, 
then we say that the given vector is linearly dependent on the others, 
and the set of vectors is also linearly dependent. If no vector in a collection 
of vectors can be written as a linear combination of the others, then the 
set of vectors is linearly independent. 

Linear dependence or independence are properties of the set of vectors 
and not of the individual vectors in the set. The differences between 
linearly dependent and linearly independent sets of vectors are very 
fundamental. The concepts of linear dependence and independence will 
appear repeatedly in our later developments. 

In a very crude intuitive sense, linearly dependent sets of vectors con¬ 
tain an element of redundancy. Since at least one vector can be repre¬ 
sented as a linear combination of the others, we could drop any such 
vector from the set without losing too much. In contrast, linearly inde¬ 
pendent vectors are essentially different from each other. No vector can 
be dropped from a linearly independent set of vectors without losing 
something. Although this intuitive discussion is very vague, the precise 
meaning of dependence and independence will become clear as the theory 
is developed further. 

We shall now give a more symmetric definition of linear dependence, 
which makes it unnecessary to single out any one vector and attempt to 
write it as a linear combination of the others. This new definition, which is 
the standard mathematical definition of linear dependence, will be shown 
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to be equivalent to the intuitive concept outlined in the first paragraph 
of this section. 

Linear dependence: A set of vectors ai, . . . , a m from E n is said to be 
linearly dependent if there exist scalars X» not all zero such that 

Xiai + X 2 a 2 + • • • + = 0. (2-44) 

If the only set of X, for which (2-44) holds is Xi = X 2 = • • • — X w = 0, 
then the vectors are said to be linearly independent . 

The above definition implies that a set of vectors is linearly dependent 
if we can find, by one means or another, scalars not all zero such that 
multiplication of a vector by the appropriate scalar and addition of the 
resulting vectors will provide the null vector. If, however, no set of X, 
other than all X t - = 0 exists, then the vectors are linearly independent. A 
set of vectors which is not linearly dependent must be linearly independent. 

Let us show that this definition is equivalent to the intuitive concept 
discussed at the beginning of this section. We want to prove that the 
vectors ai, . . ., a m from E n are linearly dependent if and only if some one 
of the vectors is a linear combination of the others . If one of the vectors is a 
linear combination of the others, the vectors can be labeled so that this 
one vector is a m . Then 

a™ — X^i + • • • + X m _ia m _i, 
or 

Xiai -(-••• + X m _ia m _i + (— l)a m = 0, 


with at least one coefficient (—1) not zero. Thus, by (2-44), the vectors 
are linearly dependent. Now suppose that the vectors are linearly de¬ 
pendent. Then (2-44) holds, and at least one X* ^ 0. Label the X* so 
that X w 5 ^ 0. Then 


or 


Xm^m — X^i -|- • • • X m _ia m _i, 


_ Xi X m —i 

\ * -v 

A m A m 


&m—lf 


and one vector has been written as a linear combination of the others. 
In this proof we have assumed, of course, that there are at least two 
vectors in the set. 

Although when we speak of a set of vectors as being linearly dependent 
or independent, it is usually assumed that two or more vectors are in the 
set, we must, in order to make later discussions consistent, define what is 
meant by linear dependence and independence for a set containing a 
single vector. The general definition expressed by (2-44) includes this 
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case. It says that a set containing a single vector a is linearly dependent 
if there exists a X 5 ^ 0 such that Xa = 0. This will be true if and only 
if a = 0 . Thus a set containing a single vector a is linearly independent 
if a ^ 0, and linearly dependent if a = 0. 

A vector a is said to be linearly dependent on a set of vectors a x ,. . ., a w 
if a can be written as a linear combination of ai, . .., a™; otherwise a is 
said to be linearly independent of a x , ..., a m . It can be noted immedi¬ 
ately that the null vector is not linearly independent of any other vector 
or set of vectors since 


0 = Oai + 0a 2 + * • • + 0a m . (2—45) 

Thus the null vector is linearly dependent on every other vector, and no 
set of linearly independent vectors can contain the null vector. 

If a set of vectors is linearly independent , then any subset of these vectors 
is also linearly independent . For suppose a x , . . . , a m are linearly inde¬ 
pendent, while, for example, ai, a 2 , a 3 are linearly dependent (3 < m). 
In this case, there exist X x , X 2 , X 3 not all zero such that Xiai + X 2 a 2 + 
X 3 a 3 = 0. If we take X 4 = • • • = X TO = 0, then YL7=\ X t a t = 0, and 
one or more X t - in the set Xi, X 2 , X 3 are not zero. This contradicts the fact 
that ai, . . . , a w are linearly independent. Similarly , if any set of vectors 
is linearly dependent , any larger set of vectors containing this set of vectors 
is also linearly dependent. 

Given a set of vectors a x , . . . , a m from E n . We say that the maximum 
number of linearly independent vectors in this set is k if it contains at 
least one subset of k vectors which is linearly independent, and there is 
no linearly independent subset containing k + 1 vectors. If the set 
ai, . . . , a m is linearly independent, then the maximum number of linearly 
independent vectors in the set is m. Unless a set of vectors contains only 
the null vector, the maximum number of linearly independent vectors 
in the set will be at least one. 

Suppose that k < m is the maximum number of linearly independent vec¬ 
tors in a set of m vectors ai, . . . , a m from E n . Then , given any linearly 
independent subset of k vectors in this set , every other vector in the set can 
be written as a linear combination of these k vectors. To see this, label the 
vectors so that ai, . . . , a* are linearly independent. The set ai, . . . , a*, 
a r must be linearly dependent for any r — k + 1 , . . . , m. This implies 
that (2-44) holds and at least one X t - 5 * 0. However, X r cannot be zero, 
because this would contradict the fact that a x , . . . , a* are linearly inde¬ 
pendent. Hence a r can be written as a linear combination of ai, . . ., a*. 

Examples: (1). Consider the vectors of E 2 , that is, two component 
vectors. Then any two non-null vectors a, b in E 2 are linearly dependent 
if X x a + X 2 b = 0, Xi, X 2 ^ 0 (why must both X x and X 2 differ from 
zero?) or b — Xa. 
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In E 2 , two vectors are linearly dependent if one is a scalar multiple of the 
other. Geometrically, this means that two vectors are linearly dependent 
if they lie on the same line through the origin. For example, (1, 0), (3, 0) 
are linearly dependent since (3, 0) — 3(1, 0). Both vectors lie along the 
xi-axis. 

Because only vectors lying along the same line are linearly dependent 
in E 2 , it follows that any two vectors not lying along the kme line in E 2 
are linearly independent. Thus (1, 2), (2, 1) are linearly independent. 

(2) It will be recalled that any vector in E 2 can be written as a linear 
combination of the two unit vectors ei, e 2 . Let a, b be any two linearly 
independent vectors in E 2 We can write 


a = OjCl ~f- d 2 B 2f 
b = b X G X ~\~ b 2 Q 2 . 

Consider any other vector x = (z x , x 2 ) ~ x x e x + x 2 e 2 in E 2 . Since 
a, b are linearly independent, either ai or &i will differ from zero. Assume 
ai 5* 0. Then 


ei = 


— (a — a 2 e 2 ), 

ai 


b = — [6j a -f- {d x b 2 — d 2 b x )Q 2 \. (2—46) 

d\ 

However, a x b 2 — a 2 b x ^ 0, since a, b are linearly independent. The 
vector x can be written 

x = — [x x a. + (a x x 2 — a 2 x x )e 2 ]. (2“47) 

d\ 


Solving for e 2 in (2-46) and substituting into (2-47), we obtain 

z = 1 [L - bl(a f> - a f l) ] a + a, b) , (2-48) 

d x d x b 2 — a 2 6i J ld x b 2 — a 2 0xj J 

or 

x = Xja -f” X 2 b. 


We have expressed x as a linear combination of a, b and we see that there 
cannot be more than two linearly independent vectors in any collection 
of vectors from E 2 . Any three vectors in E 2 are linearly dependent. The 
geometrical analogue of (2-48) is shown in Fig. 2-8. 
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x 2 



(3) The vectors [ 1 , 2, 4], [ 2 , 2 , 8 ], [1, 0, 4] are linearly dependent since 

[1,2,4] - [2, 2, 8 ] + [1, 0, 4] = [0,0,0]. 

The vectors ei = [ 1 , 0 , 0 ], e 2 = [0, 1 , 0 ], e 3 = [ 0 , 0 , 1 ] are linearly inde¬ 
pendent since 

Aiei + X 2 e 2 + X 3^3 = i, X 2 , X 3 ] = 0 

implies the component equations \i = 0, X 2 = 0, X 3 = 0. 

The arguments outlined in the second example could be used also to 
demonstrate that any four vectors from E 3 are linearly dependent. Fur¬ 
thermore, if three vectors are linearly dependent, then they all lie in a 
plane which passes through the origin. Geometric reasoning will show 
that such a plane can be passed through any two vectors, and any linear 
combination of these vectors will lie in that plane (prove this). Hence, if 
any vector can be written as a linear combination of two vectors, it must 
lie in the plane determined by the two vectors. Conversely, any three 
vectors in E 3 which do not lie in a plane are linearly independent. 

After having defined linear dependence and independence, we are faced 
with the question of how to determine whether any given set of vectors is 
linearly dependent. A complete answer will not be possible until after 
Chapters 3, 4, and 5 have been covered. However, we can indicate the 
nature of the problem. Suppose we have m n-component vectors ai = 
[an, ... , a»i], . . ., a m — [aim, . . ., a nm ]. Two subscripts are used on 
the components of the vectors. The first subscript refers to the relevant 
component of the vector, and the second subscript indicates the relevant 
vector. If the vectors are linearly dependent, then there exist A* not all 
zero such that 

£ X, ai = 0 . 

«=i 


( 2 - 49 ) 
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If (2-49) can be satisfied only with all X» = 0, the vectors are linearly 
independent. Let us write the n component equations for (2-49). They 
are: 

ttuXi + * * * + UlmXm = 0, 

U 21 X 1 + * * * + U2mXm = 0, (2—50) 


UnlXl + * * * + UnmXm — 0. 

Here we have a set of n simultaneous linear equations in m unknowns, 
and we are to solve for the X,-. Equations (2-50) are called homogeneous 
since the right-hand side of each equation is zero. We have not studied 
the properties of solutions to a set of equations like those in (2-50). We 
shall do so in Chapter 5. The set of equations always has a solution, but 
there may or may not be a solution with one or more X t - different from zero. 
If the vectors are linearly dependent, there will be a solution with not all 
X* = 0. 

2-8 The concept of a basis. We have already seen that any vector in 
E n can be written as a linear combination of the n unit vectors. Further¬ 
more, we have noted that any vector in E 2 could be represented as a linear 
combination of any two linearly independent vectors. A set of vectors 
such that any vector in E n can be represented as a linear combination of 
these vectors is of special interest. We make the following definition: 

Spanning set: A set of vectors ai, . . . , a r from E n is said to span or 
generate E n if every vector in E n can be written as a linear combination of 
ai,..., a r . 

However, we are not looking indiscriminately for any set of vectors 
which will span E n ; rather, we are seeking a set containing the smallest 
number of vectors which will span E n . Any set of vectors spanning E n 
which contains the smallest possible number of vectors must be linearly 
independent. If the vectors were linearly dependent, we could express 
some one of the vectors of the set as a linear combination of the others 
and hence eliminate it; in such a case, we should not have had the smallest 
set. However, if the set of vectors is linearly independent, then we have 
the smallest set, for if we dropped any vector, we could not express every 
vector in E n as a linear combination of the vectors in the set (in particular, 
we could not express the vector dropped from the set as a linear combina¬ 
tion of the remaining vectors). Any linearly independent set of vectors 
which spans E n is called a basis for E n : 

Basis: A basis for E n is a linearly independent subset of vectors from E n 
which spans the entire space. 
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A set of vectors which forms a basis has two properties. Its vectors must 
span E n and must be linearly independent. A basis for E n is by no means 
unique. As we shall see, there are an infinite number of bases for E n . 

To begin our discussion of bases, we prove that the n unit vectors 
ei, . . . , e n form a basis for E n . The unit vectors are linearly independent 
since the expression 

Aiei + • • • + A n e n = [Ai, . . ., A„] = 0 (2-51) 

implies that 

Ai = 0, A 2 = 0, . . ., A n = 0. 

As observed previously, any vector x in E n can be written as a linear 
combination of the in the following way: 


x = x x ei +-b x n e n . 

Thus the unit vectors for E n yield a basis for E n . 

The representation of any vector in terms of a set of basis vectors is unique, 
that is, any vector in E n can be written as a linear combination of a set of basis 
vectors in only one way. Let b be any vector in E n , and a ly . . . , a r a set 
of basis vectors. Suppose that we can write b as a linear combination of 
the a i in two different ways, namely: 

b = Aiai + • • • + A r a r , 

b = Aiai + • • * + A£a r . 

Subtracting (2-53) from (2-52), we obtain 

(Ai — A'i)ai + • * * + (A r — A£)a r = 0. 

However, the a t are linearly independent and thus 

(Ai - Ai) = 0, . . . , (A r - A') = 0. 

Hence A,- = A', and the linear combination is unique. 

It is not at all true that the representation of a vector in terms of an 
arbitrary set of vectors is unique. Suppose we are given a set of m vectors 
= [an, , a»i], . . . , a w = [ai m , • • . , <w] and the vector b = 
[ 61 , . . ., b n ]. We desire to write b as a linear combination of a iy . .. , a m , 
that is, 

b = Aiai ~b + A m a m . (2—55) 

In component form, (2-55) becomes 

b 1 = ctnAi + • • • + CtlmAm, 


(2-52) 

(2-53) 


(2-54) 


bn — OnlAi + * * • + a nm \ m . 


(2-56) 
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This is a set of n simultaneous linear equations in m unknowns (the X t ). 
These equations may have no solution, a unique solution, or an infinite 
number of solutions. The conditions under which each of these possibili¬ 
ties can occur will be studied in Chapter 5. If the set of a,- forms a basis, 
then we know that the equations (2-56) have a unique solution. If the 
a i do not form a basis, there may be a unique solution, no solution, or an 
infinite number of solutions. 

Example: The vector a = [2, 3, 4] can be written uniquely in terms of 
the vectors ei = [1, 0, 0], e 2 = [0, 1, 0], and e 3 = [0, 0, 1] which form 
a basis for E 3 : 

a = 2ei -|- 3e 2 4e 3 . 

Vector a can also be written uniquely in terms of the vectors ei and 

b = [0, |, 1]: 

a = 2ei -f- 4b. 

However, it is not possible to express a as a linear combination of ©i, e 2 
only. In this case, the set of equations (2-56) has no solution. 

If we attempt to write a as a linear combination of the set of vectors 
©i, e 2 , e 3 , and c = [1, 0, 1], the set of equations (2-56) will provide an 
infinite number of solutions. Two possibilities are: 

a = 1.5©i + 3e 2 4- 3.5e 3 -f- 0.5c, 

a = + 3e 2 -f- 3e 3 c. 

2-9 Changing a single vector in a basis. We have mentioned earlier 
that there is no unique basis for E n . We shall now investigate the con¬ 
ditions under which an arbitrary vector b from E n can replace one of the 
vectors in a basis so that the new set of vectors is also a basis. The tech¬ 
nique of replacing one vector in a basis by another such that the new set 
is also a basis is fundamental to the siipplex technique for solving linear 
programming problems. 

Given a set of basis vectors a x ,.. . , a r for E n and any other vector b ^ 0 
from E n : Then, if in the expression of b as a linear combination of the a*, 

b = £ (2-57) 

1=1 

any vector a» for which ai ^ 0 is removed from the set a x , . . . , a r , and b is 
added to the set f the new collection of r vectors is also a basis for E n . 

To prove this statement, note that, since a x , . . . , a r form a basis, they 
are linearly independent and 

r 

T J X»a* = 0 implies X, = 0, i = 1,..., r. 

«=i 
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*3 



*1 


Figure 2-9 

We would like to show that the new collection of vectors is also linearly 
independent. Without loss of generality, we can assume that in Eq. (2-57) 
a r t* 0, replace a r by b, and obtain the new set a x ,. . . , a r _ x , b. To 
demonstrate that this collection is linearly independent, it must be shown 
that 

r—I 

^2 $ t a» + 8b = 0 implies 5» = 0, i = 1, . . ., r — 1, 

i=i 

and 

5 = 0. (2-58) 

If the set is linearly dependent, then 8 cannot vanish since, by assump¬ 
tion, a x , . . . , a r _i are linearly independent. 

Suppose 8 5* 0. Using (2-57) in (2-58), we obtain 

r—1 

(5»- + OLi 8)&i + 8ar&r — 0. 

*= 1 

But otf 8 9 ^ 0. This contradicts the assumption that a x ,. . ., a r are lin¬ 
early independent. Hence 5 = 0, and consequently 8i = 0, i = 1, . . ., 
t — 1, which implies that a x ,.. . , a r _ x , b are linearly independent. 

For ai, ..., ar_ x , b to form a basis, it has to be shown that any vector 
x in E n can be represented as a linear combination of this set. Vector x 
can be represented as a linear combination of a x , . . ., a r because these 
vectors form a basis, that is, 

— ^ ^ y *a*. 

1=1 


x 


(2-59) 


4 3 
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By assumption, a r ^ 0. From (2-57), 


1 K a i 

ar = -»>- 


t=l 


Substituting (2-60) into (2-59), we obtain 


(2-60) 


x = 



a i 

Otr 


Vr)a» 


+ -b, 

«r 


(2-61) 


which expresses x as a linear combination of Hi, . . . , a r _i, b. Thus 
ai, . . ., ar_i, b form a basis for E n . 

If, in (2-57), b replaces a vector a, for which a { = 0, then the new set of 
vectors is linearly dependent and does not form a basis for E n . To prove 
this, take a r = 0 and replace a r by b. Then 


b = ^2 or<a 


or 


r—1 

b — ^2 otiSii = 0. 

i=i 


(2-62) 


The vectors a x , . . ., a r _ x , b are linearly dependent. 

Example: Imagine that we want to insert the vector b = [3, 0, 4] into 
the basis ei, e 2 , e 3 and remove one of the vectors in the basis. We have 


b = 3ei -f* 0e 2 4- 4e 3 . 

According to the preceding discussion, we can remove either e x or e 3 
to obtain a new basis; that is, e 2 , e 3 , b or e x , e 2 , b will form a basis for E 3 . 
We cannot remove the vector e 2 and still maintain a basis. This is illus¬ 
trated geometrically in Fig. 2-9. If e 2 is removed, e x , e 3 , b all lie in the 
xix 3 -plane and are linearly dependent. If either e x or e 3 is removed, the 
new set of vectors does not lie in a plane. 


2-10 Number of vectors in a basis for E n . The preceding theorems on 
bases have not depended on the actual number of vectors in the basis. 
However, our results have led us to suspect that every basis for E n con¬ 
tains the same number of basis vectors, and that there are precisely n 
vectors in every basis for E n . This is indeed correct, as we shall now show. 

We shall begin by proving that any two bases for E n have the same number 
of basis vectors. Let a x , . . . , a M be one set of basis vectors for E n , and 
bj, . .. , b„ be any other set of basis vectors. The theorem states that 
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u = v. To prove it, note that b v can be expressed as a linear combination 
of a x ,..., a u : 

I* 

b v = X t *a*, 

and that at least one X* 5 ^ 0 (since the null vector cannot be a member of 
a linearly independent set). Without loss of generality, we can set X w 0. 
From the results of the preceding section we know that a x , . . . , a M ~ x , b v 
form a basis for E n . Hence b*_ 1 can be written as a linear combination 
of this set of basis vectors, 

u — 1 

b v —1 = 5ia t - + 5bv 

At least one ^ 0; otherwise b^— x would be merely a scalar multiple of 
b„, contradicting the fact that the by are linearly independent. We can 
assume that 8 U — x 5 ^ 0. Hence, the set a x , . . . , a w _ 2 , b^_ x , b„ forms a 
basis for E n . This process can be continued until a basis is found which 
must take one of the following forms: 

a x , . . ., a M —v> b x , . . . , b v or b x , . . . , b v . (2~63) 

There must be at least as many a* as there are by; otherwise a basis of the 
form b v _ w+1 , . . . , bv would be obtained, and the remaining by would be 
linearly dependent on bv- M +i, . . . , b v , contradicting the fact that all the 
by are linearly independent. Thus we conclude that 

u > v. (2-64) 

However, one can start with the by and insert the a,. This procedure would 
yield 

v > u. (2-65) 

Therefore u = v, and the theorem is proved. 

To determine the actual number of vectors in a basis for E n , it is only 
necessary to find a basis for E n and count the number of vectors in this 
basis. We have already shown that the n unit vectors e x , . . . , e n form a 
basis for E n . It immediately follows that every basis for E n contains pre¬ 
cisely n vectors. 

We can now see that any set of n linearly independent vectors from E n 
forms a basis for E n . Let a x , . . . , a n be such a set of linearly independent 
vectors. To prove this result, we only need to start with the basis 
e x , . . ., e n and insert the a*. After n insertions, which can always be made 
(for the same reasons used in proving that two bases always have the 
same number of elements), we obtain a new basis a x , . . . , a„. Thus, it 
is obvious that any n + 1 vectors from E n are linearly dependent , since the 
assumption that they are linearly independent leads to a contradiction, 
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because any subset of n of them is a basis, and the remaining vector could 
be expressed as a linear combination of the basis vectors. 

Any m < n linearly independent vectors from E n form part of a basis 
for E n , that is, if ai, . . . , a m from E n are linearly independent, then 
n — m additional vectors in E n can be found such that a x ,. .. , a m and 
the other n — m vectors in E n form a basis for E n . Again, the proof de¬ 
pends on starting with a basis and inserting the a* in precisely the same 
way as was done before. If we start with unit vectors as a basis, we see 
that the additional n — m vectors used to fill out the basis can be unit 
vectors. As a specific example of this theorem, any non-null vector from 
E n is part of a basis and, therefore, there is an infinite number of bases 
for E n . 

We have characterized the dimension n of a euclidean space by the 
number of components in the set of vectors which form the space, that is, 
E n is the set of all n-component vectors. We have shown that the maxi¬ 
mum number of linearly independent vectors in E n is n. In some ways, 
it is more desirable to define the dimension of a space in terms of the 
maximum number of linearly independent vectors which can exist in that 
space. This definition avoids connecting the dimension of a space to the 
number of components in the vectors forming the space. The advantage 
of this approach will become clear in the discussion of subspaces. 


*2-11 Orthogonal bases. It will be recalled that two n-component 
vectors a, b are orthogonal if a'b = 0. Let us suppose that we have n 
vectors Vi, . . ., v n from E n which are mutually orthogonal and all differ¬ 
ent from 0 so that 

v' v> = 0 (all i, j] i 7 * j). (2-66) 

This set of vectors forms a basis for E n . The proof follows immediately 
if we can show that the set v x , ... , v n is linearly independent since any 
set of n linearly independent vectors from E n forms a basis for E n . Con¬ 
sider the problem of finding the Ai which will satisfy 

A x v x + • • • + A n v n = 0. (2-67) 

Forming the scalar product of v x and (2-67), we obtain 


Ai|vi| 2 + A 2 viv 2 H-+ A n viv n = viO = 0, 


and by (2-66) 


Ai = 0, 


(2-68) 


since, by assumption, |v x j 2 ^ 0. If we form the scalar product of (2-67) 
and v 2 > we obtain A 2 = 0, etc. Hence, each A,*, (i = 1 , . . ., n) is zero, so 


* Starred sections can be omitted without loss of continuity. 
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the vectors are linearly independent and form a basis for E n . Thus any 
set of n mutually orthogonal nonzero vectors from E n yields a basis for E n . 
Let us divide each vector v t - by its length |v t | and write 



This can be done since |v t -| s* 0. The u* represent vectors of unit length; 
thus 

ufti = 1 , u'u, = 0 (i ^ j). (2-70) 

Orthonormal basis: A set of n mutually orthogonal vectors of unit 

length from E n forms what is called an orthonormal basis for E n . 

Orthonormal bases are especially interesting because any vector x 
from E n can be expressed very easily as a linear combination of the 
orthonormal basis vectors. If Ui,..., u n form an orthonormal basis for 
E n , and we want to write 

X = XxUi H-b X n lln, (2-71) 

then the scalar product of u* and (2-71) will yield 

\i = u'iX [see (2-70)]. (2-72) 

The scalar X* is found simply by forming the scalar product of u,• and x. 
Note: the unit vectors ei,.. ., e n form an orthonormal basis for E n . 

Since any set of mutually orthogonal nonzero n-component vectors is 
linearly independent (prooff ), it is not possible to have n + 1 mutually 
orthogonal nonzero vectors in E n . 

Any set of n given linearly independent vectors from E n can be con¬ 
verted into an orthonormal basis by a procedure known as the Schmidt 
orthogonalization process. Let us suppose that aj,. .., a n are n linearly 
independent vectors from E n . We select any vector from this set, for ex¬ 
ample, ai. This vector defines a direction in space, and we build the 
orthonormal set around it. Let us define the vector of unit length Ui las 


Ui = 


ai 

i a il 


(2-73) 


To obtain a vector V 2 orthogonal to Ui, we subtract from a 2 a scalar 
multiple of Uj; that is, V 2 is expressed as 

v 2 = a 2 — «iUi, (2-74) 

and a x is determined so that u[v 2 = 0. Hence 

a x = uia 2 , 


which can be thought of as the component of a 2 along Ui. Also, the vec¬ 
tor a x iii may be interpreted as the vector component of a 2 along Ui. 
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Thus, 


v 2 = a 2 — (uia 2 )ui. 


A second unit vector orthogonal to Ui is defined by 


u 2 = 


V2 . 

|val ’ 


(2-75) 

(2-76) 


this can be done since |v 2 | 5* 0 (why?). A vector orthogonal to Ui, u 2 is 
found by subtracting from a 3 the vector components of a 3 along Uj, u 2 . 
This gives 

v 3 = a 3 (uia 3 )u x — (u' 2 a 3 )u 2 . (2-77) 


The vector v 3 is clearly orthogonal to Uj, u 2 . The third unit vector which 
is orthogonal to Ui, u 2 is 



(2-78) 


This procedure is continued until an orthonormal basis is obtained. In 
general, 

r—1 

Vr = a r — ^ (u(a r )uf, (2-79) 

*—1 

». = •£[" (2-80) 


Example: Using the Schmidt process, construct an orthonormal basis 
from ai = [2, 3, 0], a 2 = [6,1, 0], and a 3 — [0, 2, 4]. If we plot the 
vectors a 2 , a 2 , a 3 , it is easily seen that they are linearly independent since 
they do not line in a plane. Then if 


■i = -jjj = (13) 1/2 [2, 3, 0] = [0.554,0.831, 0]; 
v 2 = a 2 — (uia 2 )u!; 


(u'ia 2 ) = 4.16, (u'ia 2 )u! = [2.30, 3.45, 0]; 

v 2 = [3.70, -2.45, 0]; 
u 2 = = [0.831, -0.554, 0]; 

v 3 = a 3 — (uia 3 )uj — (u 2 a 3 )u 2 ; 


uia 3 = 1.664, u 2 a 3 = —1.106; 

(uia 3 )ui = [0.921, 1.386, 0], (u' 2 a 3 )u 2 = [-0.921, 0.614, 0]; 

v 3 = [0,0,4]; 

u 3 = [0,0,1]. 
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The fact that any set of m < n linearly independent vectors from E n 
constitutes part of a basis, and the Schmidt process demonstrate that 
any set of m < n mutually orthogonal vectors of unit length in E n is 
part of an orthonormal basis; that is, there exist n — m vectors of unit 
length in E n which, together with the m orthogonal vectors of unit length, 
form an orthonormal basis. 

If two nonzero vectors a, b from E n are both orthogonal to n — 1 mutually 
orthogonal vectors of unit length Ui, . . . , u n _ x from E n , then b is a scalar 
multiple of a. This follows immediately since a, Ui, . . . , u n -i form a basis, 
and 

b = Xa + XiUi + • • • + X»_iu n _i. 

However, 


u£b = 0, 
hence 

and 


ute = 0 (i = 1,1), Ufa = 0 (i & j); 

X» = 0 (t = 1, .. ., n — 1), 
b = Xa. 


*2-12 Generalized coordinate systems. When writing a vector a = 
[ai, . . . , a n ] as an ordered n-tuple of numbers, we have assumed that the 
components a t lie along the coordinate axes defined by the unit vectors 
ei, . . . , e n . Our preceding discussions were all implicitly based on a 
coordinate system defined by unit vectors. There is no reason, however, 
why we should be limited to one coordinate system since any set of basis 
vectors can define a system of coordinates. 

Let us consider a set of basis vectors Vi, . . . , v n for E n . For convenience, 
we shall assume that each V{ is of unit length. If x is any point in E n , 
then 

X = a 1 V 1 H -+ oc n V n . (2-81) 


In (2-81), all vectors x, v* can be thought of as referred to the basic 
coordinate system defined by e x ,. . ., e n . However, x can be equally 
well characterized by the numbers e*i, . . ., a n . For example, the vector 
x v = [a 1? ...,a n ] can be considered to be the representation of x in the 
coordinate system defined by the basis vectors v x , . . . , v n . A subscript 
is placed on x to indicate that x v is not referred to the coordinate system 
defined by ei, . . . ,e n , but instead to the coordinate system defined by 
the Vi. 

If X is any scalar, multiplication of (2-81) by X shows that 


(Xx)v XXy [Xai, . . . , Xo£ n ]. 


(2-82) 
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Assuming y to be any other vector in E n , we obtain 

y = $iVi H-+ f3nV„, (2-83) 

and 

y T = l/3i, • • •, AJ. (2-84) 

Adding (2-81) and (2-83), we see that 

(x + y) v = x v + yv — [<*i + Pi, .. ., «» + AJ- (2-85) 

Multiplication by a scalar and addition have precisely the same form 
in an arbitrary coordinate system as in the system defined by e i; . . ., e n . 

The scalar product, however, does not—in any coordinate system— 
have the form (2-31). Forming the scalar product of y, x by means of 
(2-81) and (2-83), we see that 

y'x = X X = EE “»&' cos 9i > 

i=l j=l i= 1 ;=1 

= X cciPi + X cos 6i >’ (2-86) 

i= i 

»=aj 

where 0,y is the angle between v t - and Vy. Equation (2-86) is not usually 

Let us suppose that v 1? . . . , v n is an orthonormal basis. Equation (2-86) 
then becomes 

y'x = X = y» x »- (2-87) 

t-1 

Hence, in any orthogonal coordinate system, all the operations with vec¬ 
tors, including the scalar product, can be performed in the same way as 
in the coordinate system defined by ei, . . ., e n . For this reason, orthog¬ 
onal coordinate systems are particularly useful. 

Many mathematics texts do not start out with defining a vector as an 
ordered n-tuple of numbers. Instead, they define vectors as algebraic 
elements for which the operations of addition and multiplication by a 
scalar are defined and assumed to possess the properties (2-23) through 
(2-26). The entire theory is then developed without referring to an 
n-tuple. This approach is abstract and tends to be somewhat difficult 
for students with little mathematical background. However, it does have 
the great advantage that it avoids the problem of coordinate systems 
since vectors are defined without reference to a coordinate system. Hence 
it follows immediately that the theory holds for any coordinate system. 
[The scalar product is klso defined abstractly in terms of its properties, 
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and then it is shown that, for orthogonal coordinate systems, it takes 
the concrete form of Eq. (2-31).] 


2-13 Vector spaces and subspaces. For certain purposes, it is desirable 
to treat collections of vectors from a slightly more abstract point of view 
than we have as yet presented. We shall introduce now the concept of 
a vector space. 

Vector space: A vector space is a collection of vectors which is closed 

under the operations of addition and multiplication by a scalar. 

The expression, “a set of vectors is closed under addition and multipli¬ 
cation by a scalar, ” means that if a, b are in the collection, the sum a + b 
is also in the collection, and if a is in the collection, Xa is in the collection 
for any scalar X. 

The definition of a vector space says nothing about a scalar product, 
length, or distance. These concepts do not need to be defined in a vector 
space. Only addition and multiplication by a scalar need to be defined. 
From the closure property, we see that if a is in a vector space, so is —a, 
since (—l)a is in the collection. Similarly, the null vector is always in a 
vector space, since Xa is in the space, and we can assume X = 0. The 
totality of all n-component vectors is called an n-dimensional vector space and 
is denoted by F n . The space F n is identical with the n-dimensional eu¬ 
clidean space E n if length is defined in V n as in E n . Although E n is clearly 
a vector space, it does not follow that any F n is an E n . 

Out of all the vectors in F n , it is possible to find various subsets of 
vectors which are themselves vector spaces. For example, the set of all 
three component vectors forms a three-dimensional vector space. How¬ 
ever, if we consider all three component vectors lying in the plane x 2 = x 3 , 
that is, all vectors of the form [x\, x 2 , x 2 ], then this collection is also a 
vector space. Although it is composed of three component vectors, we 
do not call it a three-dimensional vector space since all the vectors lie in 
a plane. Such a vector space is referred to as a subspace of F 3 . 

Subspace: A subspace S n of the n-dimensional vector space V n is defined 

to be a subset of V n which is itself a vector space. 

The subscript n on S n means that the vectors have n components, i.e., 
are elements of F n . In assigning a dimension to a subspace, we shall find 
it convenient to define the dimension of a space as the maximum number 
of linearly independent vectors in the space, rather than to refer to the 
number of components in the vectors generating the space. We define the 
dimension of a subspace S n as the maximum number of linearly independent 
vectors in the subspace. In our example, any vector [x x , x 2f x 2 ] can be 
written 


[xu * 2 , £ 2 ] = x x ei + x 2 [0, 1 , 1]. 
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Figure 2-10 

The maximum number of linearly independent vectors in this subspace is 
two; hence its dimension is two. This is precisely what we want intuitively 
because we know that all the vectors lie in a plane. To simplify later dis¬ 
cussions, we consider S n = V n to be a subspace of F„ having dimension n, 
i.e., the subspace is all of F«. 

A clear geometrical interpretation of subspaces in E 3 can be given. 
Any subspace of E 3 is either E 3 itself, a plane through the origin, a line 
through the origin, or just the origin itself (this is a subspace of a single 
element and has dimension 0 ). 

Example: The collection of all vectors X[l, 1,1] for any scalar X form a 
subspace of E z since 

Xil + X 2 I = (Xi + X 2 ) 131 

which is also an element of the collection. This collection of vectors lies 
on the line which passes through the origin and the point [ 1 , 1 , 1 ] (see line 
in Fig. 2-10). 

The collection of all vectors [x if 0, x 3 ] is a subspace of E 3 and repre¬ 
sents the xix 3 -plane (see S% in Fig. 2-10). 

The collection of all linear combinations of m n-component vectors 
aj, . . ., a w is a subspace of F n . To prove this, note that any vectors a, b 
in the collection can be written 

m m 

a = a t a» b = ^2 a'iSii . 

t=l i—l 
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Then 

m 

Xa = ^2 

*=i 

is also a linear combination of the a t and is in the collection. Furthermore, 

m 

a + b = ^ (oLi + «») a * 

»-i 

is a linear combination of the a* and is in the collection. 

A set of vectors from a subspace S n is said to span or generate S n if 
every vector in S n can be written as a linear combination of this set of 
vectors. A linearly independent set of vectors from S n which spans S n 
is called a basis for S n . If S n has dimension k , then a basis for S n will 
contain precisely k vectors. The arguments used in Section 2-10 show 
that if S n has dimension k , any k linearly independent vectors from S n 
are a basis for S n . 

If S n is a subspace of E n having dimension k, then, except for notation, 
Sn is the same as E k . The vectors in S n will have n components rather 
than k components. However, it is possible to establish a unique corre¬ 
spondence between the vectors in S n and the vectors in E k . For example, 
the subspace of E 3 (discussed above) which is the collection of all vectors 
[x\ y x 2y x 2 ] has dimension 2. Corresponding to any vector in this subspace 
there is one and only one vector [xi, x 2 ] in E 2 , and for any vector [xi, x 2 \ 
in E 2 there is one and only one vector [x\, x 2f x 2 ] in the subspace. We say 
that a subspace of E n of dimension k is isomorphic to E k . 
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Problems 

2-1. Given the vectors a = (2, 1), b = (1, 3), plot the vectors —2a, a+ b, 
and illustrate the use of the parallelogram rule to find a + b, 2a + Jb. 

2-2. Given a = [a\, . . ., a»], b = [bi, . .., 6 n ], c = [c\, .. . , c n ], show by 
direct computation that 

(a + b) + c — a + (b + c) = a + b + c, 

a'b = b'a, 

|a — b| = | (a — c) — (b — c)|. 

2-3. If a = [3, 2, 1], b = [1, 5, 6], solve the following vector equation for x: 

3a + 2x = 5b. 

2-4. If a is an n-component vector, show by direct computation that 

a + 0 = 0 + a = a, 3a = a + a + a, a'O = 0. 

Show that the equation a + x = 0 has the unique solution x = —a. 

2-5. Show that the vectors [2, 3, 1], [1, 0, 4], [2, 4, 1], [0, 3, 2] are linearly 
dependent. 

2-6. Express x = [4, 5] as a linear combination of a = [1, 3], b — [2, 2]. 

2-7. Given three linearly independent vectors ai, a 2 , a 3 from E 3 , obtain a 
formula for any vector x in E 3 as a linear combination of ai, a 2 , a 3 , that is, 
evaluate the X,* in terms of the components of x, ai, a 2 , a 3 when 

x = Xiai -f- X 2 a 2 ~h X3a3. 

2-8. Given the basis vectors ei = [1, 0, 0], [0, 1, 1], e 3 = [0, 0, 1] for E 3 . 
Which vectors can be removed from the basis and be replaced by b = [4, 3, 6], 
while still maintaining a basis? Illustrate this geometrically. 

2-9. Given the basis vectors ei, [0, 1, 1], e 2 for E 3 . Which vectors can be 
removed from the basis and be replaced by b = [4, 3, 3], while still maintaining 
a basis? Illustrate this geometrically. 

2-10. Let bi, . . . , b„ be a basis for E n . Suppose that when a given vector x 
in E n is written as 

x = Xibi + * * • + X n b n , 

it has all X,- > 0. Assume that we have a vector a and want to remove some b; 



54 


VECTORS 


[CHAP. 2 


from the basis and replace it by a to form a new basis: 


a « aibi H-* + «»b„. 

Under what conditions can a replace b»? Assuming a can replace b*, write x 
as a linear combination of the new basis vectors. In addition, let us require that 
the scalar coefficients in the linear combination which expresses x in terms of 
the new basis vectors be non-negative. Show that a necessary condition for the 
new scalar multipliers to be non-negative when b* is removed is that a* > 0. 
It does not follow, however, that simply because > 0, all the new scalar 
coefficients will be non-negative. If several a, > 0, show how to determine 
which vector b* is to be removed so that the new scalar coefficients will be non¬ 
negative. Hint: 

x = £X,b* = £X,b, — 0a + 0a, 

and 

0a = 0£<**b*> 0 > 0. 

Note that X* — 0a* must be zero for one b<, and X* — 0a* > 0 for all other b*. 

2-11. Suppose that in Problem 2-10, we have defined a function zq = £c,-X,*, 
where the c* are scalars, and the X* are as defined there. Corresponding to each 
b,- there is a number c* which we shall call its price. Let the price of a be c. 
Now a is inserted into the basis and some b,- is removed, the b, being chosen by 
the method developed in Problem 2-10, so that the new scalar coefficients are 
non-negative. Find an equation for the new value of zo when a is introduced 
into the basis. Denote £c*a* by z. Imagine next that several different vectors ay 
can be inserted into the basis, but that only one of them will be actually intro¬ 
duced. Develop a criterion for selecting the ay which will produce the maximum 
increase for zq. What happens when there is no increase in zo after insertion of 
any of the ay? How is this expressed mathematically? 

2-12. Compute the angle between the vectors 

a - [4, 7, 9, 1, 3], b = [2, 1, 1, 6, 8]. 

2-13. If the n-component vectors a, b, c are linearly independent, show that 
a + b, b + c, a + c are also linearly independent. Is this true of a — b, 
b -|“ c, a -f" c? 

2-14. Show how the change of basis technique (discussed in Section 2-9) 
can be used to express a vector in terms of some arbitrary basis by starting 
with the unit vectors as a basis and then proceeding to insert the vectors of 
the arbitrary basis. Illustrate by expressing b = [4, 6, 1] in terms of ai = 
[3, 1, 2], a 2 - [1, 3, 9], a 3 = [2, 8, 5]. 

2-15. Consider the set of n-tuples x = [xi, x 2 , . .., x n ] for which xi,. .., x m 
are completely unrestricted, while 

m 

x m+ i = ^2 axjXj, i = 1, — m. 

j-i 
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Show that this set of vectors forms a subspace of F n . What is the dimension of 
the subspace? Is the set of vectors 

ai = [1, 0, 0, ..., 0, an, 021 , ...» an-mi], .. ., 

&m = [0, 0, . . . , 1, dim) • • • j d n -m,m] 

a basis for the subspace? 

2-16. Do the following vectors form a basis for E 3 ? 

(a) ai = [3, 0, 2], a 2 - [7, 0, 9], a 3 = [4, 1, 2]; 

(b) ai - [1, 1, 0], a 2 = [3, 0, 1], a 3 = [5, 2, 1]; 

(c) ai = [1, 5, 7], a 2 - [4, 0, 6], a 3 = [1, 0, 0]. 

2-17. Consider two subspaces Sh and £" of V n . Define the sum of the sub¬ 
spaces, written symbolically Sh + Sh', as the collection of all vectors a + b, 
where a is any vector in Sh, and b is any vector in Sh'. Prove that Sh + S'n 
is also a subspace of F n . If Sh is the collection of vectors X[2, 1], and S' n ' is the 
collection of vectors a[l, 3], show that Si + S" is F 2 . Illustrate geometrically. 

2-18. Referring to Problem 2-17, prove that if 0 is the only vector which is 
common to Sh and S", the dimension of (Sh + Sh') equals dimension of + 
dimension of Sh '. Hint: Let ai,. . ., a„ be a basis for Sh, and bi, .. ., b„ a basis 
for S". Show that any vector in Sh + Sh' can be written as a linear combination 
of ai, ..., a tt , bi, ..., b„. To prove that ai, ..., a„, bi, . . ., b„ are linearly 
independent, suppose 

£Xja» + £ Sibj = 0 or 53X»a, = —23 5»b t . 

Note that 13X t a t is in Sh and 13M>t is in Sh'. 

2-19. The intersection of two subspaces, written Sh fl S", is the collection 
of all vectors which are in both Sh and S'n . Prove that the collection of vectors 
Si fl S'n is a subspace of F n . Let *S 2 be the collection of vectors Xei and S'{ — F 2 . 
What is S 2 O S' 2 '? Illustrate geometrically. 

2-20. Prove that the dimension of S' n fl S'n < minimum [dimension of S' n 
and dimension of S'n]. 

2-21. Prove that dimension (Sh + S'n) + dimension 0SJ» fl S'n ) = dimension 
Sh + dimension S'n . Hint: Let m, ..., u r be a basis for S' n H S", ai, .. ., a«, 
ui, ..., u r a basis for Sh, and bi, . .., b„, ui, . .., u r a basis for Sh'. 

2-22. Let ai, ..., a m be a set of linearly independent vectors from E n and 
a any other vector in E n . Show that if a is linearly independent of the set 
ai, ..., am, then a, ai, ..., a m is a linearly independent set of vectors. 

2-23. In the text it was shown under what circumstances a vector could 
be removed from a basis and replaced by another vector in such a way that the 
new set of vectors was also a basis. Suppose now that we wish to form a new 
basis by removing two vectors from a basis and replacing them by two other 
vectors. Try to derive the conditions under which the new set of vectors will 
be a basis. What are the difficulties? Is it easier to insert the two vectors one 
at a time or both together? 
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2-24. Prove that |a — bj >0 unless a = b. Prove that [a — b| = |b — a|. 

2-25. Prove the Schwarz inequality |a'b| < |a| |b|. Hint: |Xa + b| 2 = 
X 2 |a| 2 + 2Xa'b + |b| 2 > 0 for all X. Thus the quadratic equation |Xa+ b| 2 = 0 
cannot have two different real roots* 

2-26. Prove the triangle inequality |a — b| + |b — c| > [a — c|. Hint: 
|a — c| 2 < |a — b| 2 + 2|(a — b)'(b — c)| + |b — c| 2 . Use the Schwarz in¬ 
equality. 

2-27. By direct computation, verify the Schwarz inequality for the special 
case 

a = [4,1,2], b = [3,7,9]. 

2-28. Verify by direct computation the triangle inequality for the special 
case 

a = [3,7,1], b = [9,1,4], c = [3,0,2]. 

2-29. Verify the triangle inequality when 

a = [1,1, 1], b = [3, 3, 3], c = [-2, -2, -2]. 

Illustrate geometrically. Repeat with c = [2, 2, 2]. 

*2-30. Consider the basis ai = 5 _I/2 [1, 2], a 2 = 10~ 1/2 [—1, 3] for E 2 . Using 
ai, a 2 to-define a coordinate system, find ax, a 2 in 

x — aiai + « 2 a 2 , 

where x = [xi, X 2 ] when ei, e 2 define the coordinate system. Illustrate this 
geometrically. 

*2-31. Using the Schmidt orthogonalization process, construct an orthonormai 
basis for E 3 from the following set of basis vectors: 

ai - [2, 6, 3], a 2 - [9, 1, 0], a 3 = [1, 2, 7]. 

Illustrate this geometrically. 

*2-32. Consider any subspace S n of V„. The set of vectors orthogonal to every 
vector in S n is called the orthogonal complement of S n [written 0(£ n )]. Prove 
that 0(S n ) is a subspace of 7„. Let S 3 be a set of vectors of the form [xi, X 2 , 0]. 
WhatisO(Ss)? 

*2-33. Prove that any vector x of V„ is in S n + 0(>S n ), that is, prove 
Sn + O(Sn) = F„. Hint: Choose an orthonormal basis ai, . . ., a„ for S n and 
an orthonormal basis bi, . . ., b v for 0(S n ). No vector in V n can be orthogonal 
to these basis vectors for S n and 0(S n )• If x is not orthogonal to the set 
ai, . . ., a w or the set bi, . . ., b„ then x — (aiX)a» is orthogonal to the set 
ai, . . . , a u . Illustrate this theorem for the case where S n = S 3 is the set of 
vectors [x\ } X 2 , 0]. 

*2-34. Prove that 0[0(£„)] = S n . Illustrate this geometrically. 


* Starred problems refer to starred sections in the text. 
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The following problems deal with generalizations to vectors whose com¬ 
ponents are complex numbers. 

2-35. The solutions to a quadratic equation, such as ax 2 + + 7 = 0 

(a, P, 7 real), cannot always be written as real numbers. Depending on the 
values of a, 7, the solutions may take the form x = a -b ib, where i 2 — — 1 
or i ~ V— 1 , and a v b are real numbers. To see this, solve x 2 + 2x + 5 = 0. 
A number of the form z = a + ib (a, b real) is called a complex number. Fur¬ 
thermore, a is called the real part of z, and b the imaginary part of z (note that 
there is nothing “imaginary” about b; it is a real number). We write z = 0 
(0 = 0 + tO) if and only if a = b — 0. Given two complex numbers z\ — 
«i + ibi, Z 2 = «2 + ib 2 , z\ = Z 2 if and only if a\ = a. 2 , and b\ = 62 . If the 
complex number Z 3 denotes the sum of z\ and Z 2 , it is defined to be 

23 = + Z2 = (ai + a2) + i(bi + 62). 

If Z3 represents the product of z\ and Z2, it is defined to be 

23 = Z\Z2 = («i + ib\)(a2 + 162) - (aia2 — 6162) + i(«i&2 + 0261). 

Note that we can obtain this expression by using the ordinary laws of multipli¬ 
cation and i 2 = — 1 . If the complex number Z 3 is the quotient of z\ and Z 2 , it 
is defined to be 

Z\ 1 

23 = - -5- .-T o [(ai«2 + 6162) + i{a,2b\ — 0162)]. 

z 2 a\ + b\ 

Problem 2-38 will illustrate an easy method of obtaining the above expression. 
Note that z 1 = Z2Z3. 

Show that the definitions for equality, addition, multiplication, and division 
are consistent with the theorems on solutions of quadratic equations. How 
should subtraction be defined? Demonstrate that addition and multiplication 
of complex numbers satisfy the associative and commutative laws [for multipli¬ 
cation this means that Z 1 Z 2 = Z 2 Z 1 and zi(z 223 ) — ( 2122)23 = 21 Z 223 ]. Observe 
that real numbers are included in the set of all complex numbers since b in 
z = a + ib may be zero. 

2-36. The complex conjugate of a complex number z = a + ib is defined 
to be z* = a — ib. Show that if z is the solution to a quadratic equation, so is 
z*. Prove that 

(21 + 22)* = 2t + zt ( 21 2 2 )* = ztzt Y = % • 

W 2* 

2-37. The absolute value of a complex number z — a + ib, written \z\, is 
defined to be |z| = Va 2 + b 2 . Show that zz* = z*z = |z| 2 . Prove that 
|z*| = \z\, |ziZ 2 | = |z 1 1 |Z 2 1, I 21 /Z 2 I = 1 21 1/|Z 2 (- To write Z 1 /Z 2 as the complex 
number Z 3 ~ «3 + ^ 3 , it is necessary to eliminate the i in the denominator, 
that is, we multiply the numerator and denominator by z\ and obtain Z 3 = 
2 \ 2 %/\z 2 \ 2 ) this is the result given above for t^e quotient of two complex numbers. 
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2-38. Let z\ = 3 + 4i, 22 = 2 + ». Compute 2 x 22 , 2221 , 21 + 22 , 21 / 22 , 
22 / 21 , 2 *, 2 *, |zi|, I 22 I, ( 2122 )*, 2 * 22 , [ 21 / 22 I, and verify the general rules developed 
above. 

2-39. Most of the theory developed in this chapter can be extended directly 
to vectors whose components are complex numbers. Given two vectors z = 
[ 21 , . . ., 2 n ] and w = [w\, . . . , w n ] with complex components, z = w if and 
only if Z\ = i = 1Then z + w = [21 + wi, . .., z n + w n ], and 
7z = [Y 21 , . .., 7z„] for any complex number 7. Show that addition is associa¬ 
tive and commutative. The scalar product is defined somewhat differently when 
the vectors have complex components. The scalar product of z and w, written 
zw, is defined to be 

n 

ZW = 

*= 1 

where the * denotes the complex conjugate. This is often referred to as the 
Hermitian scalar product which, if the components are real, reduces to the 
definition of the scalar product given in the text. Show that zw — (Wz)* 
and zz = £”-1 | 2 <| 2 ; zz is called the square of the magnitude of z and is written 
|z| 2 . Observe that \z\ is a real number. One important difference between 
vectors with real components and those with complex components is the loss of 
the simple geometrical interpretation in the latter case, although it is still 
possible to think of n-dimensional spaces with complex coordinates. 

2-40. Let z = [3 + 5i, 2 + i, 1 + 9i], w = [2i, 4, 5 + 6t], 7 = 2 + 4 L 
Compute z + w, Yz, Yw, zw, wz, |z|, |w|. 

2-41. A set of k vectors zi, . . ., z* with complex components is linearly inde¬ 
pendent if the only set of complex numbers for which ]£i-i a t z» = 0 is 
oci = 0, i — 1 ,...,&. Otherwise the set is linearly dependent. Thus we 
see that the same definition applies formally to both types of vectors, i.e., to 
those with complex components and to those whose components are real num¬ 
bers. Under what conditions are the two vectors z = [ 21 , 22 ] and w = [wi f W 2 ] 
linearly dependent? Reduce the condition to one involving real numbers only. 
Consider the set of n-component vectors 6t, i — 1, ..., n; all components of 
6 i are zero except the ith component which is 1 + i. Show that these n vectors 
are linearly independent. How can any n-component vector (with real or com¬ 
plex components) be expressed as a linear combination of these vectors such 
that z = £”-1 Xtfei (the X* being complex numbers). Hint: If w = a + ib, 
tt>(l + i) = (a — b) + i(a + b). Express z = [i, 4 + 5i, 2 + 3i] as a linear 
combination of hi, ^ 2 , & 3 . 

2-42. The set of all n-component vectors z = [ 21 , .. ., z n J, with the 2 * taken 
from the set of complex numbers, forms an n-dimensional vector space which 
may be denoted by V„(C), the C indicating that the components belong to the 
set of complex numbers. A basis for this space is a set of linearly independent 
vectors which spans the space. Show that every basis contains precisely n 
vectors and that any set of n linearly independent vectors from V n (C) is a basis 
for 7„(<7). Demonstrate that any n + 1 vectors in F„(C) are linearly dependent. 
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Show that the definition of a subspace can be applied to V n (C) without any modi¬ 
fication. 

2-43. Two vectors with complex components are orthogonal if their Hermitian 
scalar product is zero. Generalize the Schmidt orthogonalization procedure to 
vectors with complex components. Construct an orthogonal basis from Zi = 
[t, 4 + 2It], Z2 = [5 + 6i, 1]. 

2-44. Show that the unit vectors ei, .. ., e* are a basis for F„(<7) as well as 
E n . 



CHAPTER 3 


MATRICES AND DETERMINANTS 


“like the glaze in a 
katydid wing 
subdivided by sun 
till the nettings are legion.” 

Marianne Moore. 


3-1 Matrices. Matrices provide a very powerful tool for dealing with 
linear models. This chapter will discuss some of the elementary properties 
of matrices and investigate the closely related theory of determinants. 

Matrix: A matrix is defined to be a rectangular array of numbers arranged 
into rows and columns. It is written as follows: 


■an 

012 

* ‘ 01n~ 



«21 

022 

* * 02n 



: 



* 

(3-1) 

_0ml 

0»i2 ' 

0mn- 



an m 

by n 

matrix 

(written m 

X n) since it has 


m rows and n columns. As a rule, brackets [ ], parentheses (), or the form 
|| || is used to enclose the rectangular array of numbers. It should be noted 
right at the beginning that a matrix has no numerical value. It is simply a 
convenient way of representing arrays (tables) of numbers. 


Since a matrix is in general a two-dimensional array of numbers, a 
double subscript must be used to represent any one of its elements (entries, 
matrix elements). By convention, the first subscript refers to the row, the 
second to the column. Thus a 2 3 refers to the element in the second row, 
third column; and a t y refers to the element in the zth row, jth column. 
No relation at all needs to exist between the number of rows and the 
number of columns. A matrix, for example, can have 100 rows and 10 
columns or 1 row and 1000 columns. Any matrix which has the same 
number of rows as columns is called a square matrix. A square matrix with 
n rows and n columns is often called an nth-order matrix. Any nth-order 
matrix is by definition a square matrix. 
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Examples: The following are matrices: 


1 o ’ 


a b 

y 

1- 

CO 

<N 

r—i 

1_ 

.0 1 . 


c e. 


.0 1 0J 


(3, 1, 7). 


But 



11 

B *]• (,‘J- 

7 

8 10 
. 9 . 


are not matrices since they are not rectangular arrays arranged into rows 
and columns. 

Matrices will usually be denoted by upper-case boldface roman letters 
(A, B, etc.), and elements by italicized lower case letters (a^, etc.) 
unless specific numbers are used. We can write: 


A = II017II = 


an 


01n 


■ a ml * * * a mn J 


(3-2) 


Observe that the expressions A, ||a t y|| do not indicate how many rows or 
columns the matrix has. This must be known from other sources. When 
only the typical element aa of the matrix is shown we use \\aij\\ rather than 
(dij) or [o t j]. The latter notations do not indicate clearly whether a matrix 
is implied or whether parentheses (brackets) have been placed around a 
single element. We shall adopt the convention that brackets will be used 
to enclose matrices having at least two rows and two columns. Paren¬ 
theses will be used to enclose a matrix of a single row. To simplify the 
printing in the text of matrices having a single column, they will be 
printed as a row and will be enclosed by brackets. The same convention 
is used in printing column vectors. In equations and examples, a matrix 
with a single column will sometimes be printed as a column for added 
clarity. 

In this chapter and in the remainder of the text, the elements of matrices 
will be real numbers. However, the results obtained also hold for matrices 
with complex elements (see Problems 3-78 through 3-86). 


3-2 Matrix operations. It is the proper definition of matrix operations 
that determines their usefulness, since a matrix per se is merely a table of 
numbers. We shall see that the definitions which seem intuitively ob¬ 
vious for operating with tables of numbers are also the most useful ones. 
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Equality: Two matrices A and B are said to be equal, written A = B, 
if they are identical, that is, if the corresponding elements are equal. Thus, 
A = B if and only if aij = for every i, j. If A is not equal to B, we 
write A ^ B. 

Examples: 



A ^ B since ai 2 5 ^ ^ 12 * ^22 ^ ^ 22 * 



Clearly, A cannot be equal to B unless the number of rows and columns 
in A is the same as the number of rows and columns in B. If A = B, then 

B = A. 

Multiplication by a scalar: Given a matrix A and a scalar X, the 
product of X and A, written XA, is defined to be 

Xan * * * 

XA — Xtt21 * X#2n * (3 3) 

-Xflnii * * ' X(2 mn _ 

Each element of A is multiplied by the scalar X. The product XA is then 
another matrix having m rows and n columns if A has m rows and n 
columns. We note: 

XA - ||XOiy|| = IkyXll = AX. 


Examples: 



Addition: The sum C of a matrix A having m rows and n columns and a 
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matrix B having m rows and n columns is a matrix having m rows and n 
columns whose elements are given by 

Cij = an + bij (all i,j). (3-4) 

I 

Written out in detail, (3-4) becomes 


c = 

Cn * • 

* c ln 


an • • 

’ ’ Q>1 n 

+ 

\i ■ ■ 

* n 


-Cm\ * * 

‘ Cfnn- 


Jbnl ' 

' * &mn- 

-bml * * 

• b mn .. 


^11 4* &u * * * Clin + bin 

Jbnl + bmi * * ■ Omn + b mn . 


This expression can be abbreviated to 


C = A + B. 


(3-6) 


Two matrices are added by adding the corresponding elements. 
Examples: 



(3) A + A = HotfU + ||a,-y|| = 11 0,7 + a»y|| = ||2a,7|| = 2A, 
and 

A + A + A = 3A, etc. 


Note that addition of matrices A, B is defined only when B has the same 
number of rows and the same number of columns as A. Addition is not 
defined in the following example: 


(4) A = 



B - 


2 4 6 
.13 5. 


(addition not defined). 
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It is obvious from the definition that matrix addition follows the asso¬ 
ciative and commutative laws, that is, 

A + B = B + A, (3-7) 

A + (B + C) = (A + B) + C - A + B + C, (3-8) 

since addition proceeds by elements and the laws hold for real numbers. 
Subtraction is defined in terms of operations already considered: 

A — B = A + (-l)B. (3-9) 

Thus, we subtract two matrices by subtracting the corresponding elements. 
The restrictions on number of rows and columns are the same for sub¬ 
traction and addition. 

Example: 

2 5l [4 2l |"~2 3’ 

A — 3 6 > B = 1 0 ; A - B - 26- 

A 8J L2 OJ 2 8_ 


3-3 Matrix multiplication—introduction. It was fairly obvious how 
equality, multiplication by a scalar, and addition should be defined for 
matrices. It is not nearly so obvious, however, how matrix multiplica¬ 
tion should be defined. At this point, the concept of matrices as mere 
tables of data must be abandoned since this approach does not give a 
clue to a proper definition. Our first thought would be to multiply the 
corresponding elements to obtain the product. However, this definition 
turns out to be of little value in the majority of problems requiring appli¬ 
cation of matrices (although recently, in some data processing operations, 
it has been found useful). 

Instead, we shall approach the subject in a different way. The value 
of matrices stems mostly from the fact that they permit us to deal effi¬ 
ciently with systems of simultaneous linear equations and related subjects. 
We have already encountered systems of simultaneous linear equations 
in Section 2-8 on the linear dependence of vectors. Let us consider a 
set of m simultaneous linear equations in n unknowns, the Xj, which can 
be written 


+ 0^12^2 + * * • + 0*1 n%n — di, 
UmlXl + a, m 2X2 + * • • + OmnXn — d m . 


(3-10) 


Note, first of all, that the coefficients a*y can be arranged into an m X n 
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matrix A = ||a;/||. Furthermore, the variables Xj may be arranged into 
a matrix of a single column and n rows which will be denoted by X = 
[x u . . ., x„]. Similarly, the constants di may be arranged into a matrix 
of one column and m rows which can be written D = [d\ 7 . .. , d m ]. The 
idea suggests itself to write the entire set of equations (3-10) in the ab¬ 
breviated form* 

AX = D. (3-11) 

Equation (3-11) will be interpreted as meaning that the matrix D is ob¬ 
tained by multiplying matrix A by matrix X. This interpretation will be 
correct only if element di of D is computed from the elements in A, X 
according to (3-10): 

n 

di — ^2 dijXj, i — 1 , . . . , ra. (3-12) 

3 — 1 


Here we have a lead toward a useful definition of matrix multiplication. 
However, Eqs. (3-11) and (3-12) suggest a way only for the multiplication 
of a matrix by another matrix of a single column. It is immediately ap¬ 
parent, however, that if our definition is to make sense, the number of 
columns in A must be the same as the number of rows in X; that is, in 
(3-10), there is one column in A for each variable and one row (element) 
in X for each variable. 

To generalize the definition of matrix multiplication, let us imagine 
that the variables xj are defined in terms of another set of variables yk by 
the following relations: 


— bnVi + • • • + birVr, 

i (3-13) 

%n ~ bniyi T~ * * * “f" b nr y rj 

r 

Xj = J2 b *y*> j = !.•••>«• 

k = l 


Each original variable Xj is a linear combination of the new variables y*. 


* The reader can, no doubt, think of many other possible forms beside that 
given in (3-11). For example, we might write (3-10) as XA = D. Furthermore, 
we could define X as a matrix with one row and n columns and D as a matrix 
with one row and m columns, and with this notation, write AX = D or XA = D. 
There are many other feasible combinations. The one we have selected leads to 
a useful definition of matrix multiplication, while many of the others do not. 
Instead of investigating all the possibilities, we shall begin with the one which 
leads directly to our goal. 
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Substitution of (3-13) into (3-12) gives 

n r 

X) °tf E bjkVk = di ' 

j=l fc=l 

Rearranging the summation signs, we have 



(3-14) 


(3-15) 


Let us write 

n 

Cik = 2 °»A*> i = 1,m, k = 1,..., r. (3-16) 

i=l 

Then (3-15) becomes 

r 

CikVk = dj, i = 1, . . . , 771. (3—17) 

\ 

\ 

This is a set of m simultaneous linear equations in r variables, the 
Suppose we write B\= ||6y*|| and Y = [y u . . . , y r \ so that, using our 
simplified notation (3-11), Eq. (3-13) can be written X = BY. Further¬ 
more, if C = |M|, (3-17) becomes CY = D. If we replace X by BY 
in (3-11), we obtain ABY = D. Thus it appears that C should be the 
product of A and B, that is: 

C = AB. (3-18) 


For this interpretation to be valid, we must define multiplication in such 
a way that each element of C is computed from the elements of A, B 
by means of (3-16). 

The matrix C is m X r, while A is m X n, and B is n X r. To define 
the product AB [see (3-18)], it is necessary that the number of columns 
in A be the same as the number of rows in B since there is one column 
in A and one row in B for each variable x } \ However, there is no restric¬ 
tion whatever on the number of rows (the number of equations) in A 
or the number of columns (the number of variables yk) in B. Given any 
two matrices A, B, the preceding discussion suggests a way to form the 
product C = AB provided the number of columns in A is equal to the 
number of rows in B, and we are now in a position to give the general 
definition of matrix multiplication. 

Matrix multiplication: Given an m X n matrix A and an n X r 
matrix B, the product AB is defined to he an m X r matrix C, whose ele¬ 
ments are computed from the elements of A, B according to 
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Cij = ^ a ik bkj, i = l,..., m, j = 1,... ,r. (3-19) 

k=l 

In the matrix product AB, A is called the premultiplier, and B the post¬ 
multiplier. The product AB is defined only when the number of columns 
in A is equal to the number of rows in B, so that we arrive at the following 
rule: 

Rule. A and B are conformable for multiplication to yield the product AB 
if and only if the number of columns in A is the same as the number of rows 
in B. 

As long as the number of columns in A is the same as the number of rows 
in B, AB is defined; it is immaterial how many rows there are in A or how 
many columns in B. The product C = AB has the same number of rows 
as A and the same number of columns as B. 

Note: The reader who is not familiar with double sums and with inter¬ 
changing the summation signs in going from (3-14) to (3-15) should 
consider the following example: Imagine the m X n matrix 


Then 


tfi&u 01&12 * * * ftibin 

a ^> 2 \ # 2^22 * * * # 2^2 » 

a m b m i a m b m 2 * * * a m bmn_ 


m r ft m ft 

^2 ^2 = ^2 a k ^2 bk > 

*=i |_}=i J k =i L y-i 


(3-20) 


is the sum of all the elements akbkj in the matrix obtained by first adding 
all the elements in one row and then summing over the rows. Similarly, 



(3-21) 


is the sum of all the elements in the matrix where we first sum all the 
elements in one column and then sum over the columns. Since (3-20) 
and (3-21) are two expressions for the same thing, they are equal. Identi¬ 
fication of the ak, bkj of (3-20) and (3-21) with the a**, bjtfjk of (3-14), re¬ 
spectively, yields the desired result. Often it is useful to represent either 
(3-20) or (3-21) by the simplified notation 


D a,kbjk- 

3* 
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3-4 Matrix multiplication—further development. The consideration of 
linear equations and the change of variables in linear equations led us to 
the most useful definition of matrix multiplication. In fact, it was the 
search for a simplified notation applicable to systems of linear equations 
which inspired Cayley in the 1850’s to introduce matrix notation. We 
shall now clarify the definition of matrix multiplication and develop some 
of the properties of the matrix product. 

At first glance, Eq. (3-19) looks rather strange; quite a few subscripts 
appear. Note first of all that the element c tJ - depends on all the elements 
in row i of A and on all the elements in column j of B. The rule for matrix 
multiplication is quite similar to the definition of the scalar product of 
vectors. Indeed, if we think of the ith row of A and the yth column of B 
as vectors, then the element in the tth row and the jth column of C is, 
in fact, the scalar product of the tth row of A and the jth column of B. 
Diagr ammatically, 


c ll 


Clr ' 


a n • 

• d\ n ~ 

Cil * 


C%r 

= 

K i • • 

d%n\ 

_Cml 


Cmr_ 


_&ml * 

d-mn _ 


'b „••• 

bn 

• 

• 

: 

: 

J>nl * * * 

bflj 



(3-22) 


Before we proceed any further, it will be helpful to examine some examples - 
of multiplication. 


Examples: 


(1) A = 

P 1 

, B = 

"2 l" 


[2 4j 


.3 5. 


C = AB = 

'[1(2) + 3(3)] 

[1(1) + 3(5)]' 

= 

'll 16' 


.[2(2) + 4(3)] 

[2(1) + 4(5)]. 


16 22 


(2) A = 

_ 3 2" 

> B = 

4 


6 1 


5 


C = AB = 

'3 2 ' 

~4 

— 

'[3(4) + 2(5)]' 

= 

22" 


.6 1. 

5 


.[6(4) + 1(5)]. 


_29_ 


(3) A = [ au 

a 12 

, B = 

'bu 

b 12 

_a 2 i 

a 2 2_ 


b 21 

&22_ 


[flll&n + a 12&2l] [«11&12 + <*12^22] 

Ja21&U + #22&2l] [G21&12 + ^ 22 ^ 22 ] 
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(4) A = 


CO 

1 _ 

i 

o 

, B = 

'4 

<r 

_i 

1 

1 

6 

sj 


_5 

2 





C = AB = 




[4 

7 1 

[6 

oo 

_ 



[3(4) -f 0(6)] [3(7) + 0(8)1 
[1(4) + 1(6)] [1(7) + 1(8)] 
[[5(4) + 2(6)] [5(7) + 2(8)]. 



12 

21" 

= 

10 

15 


32 

51. 


In order to get a feeling for the size of the matrix resulting from multi¬ 
plication of two matrices, it may be helpful to represent the matrices as 
rectangular blocks (see Fig. 3-1). 

Matrix multiplication does not obey all the rules for the multiplication 
of ordinary numbers. One of the most important differences is the fact 
that, in general, matrix multiplication is not commutative, that is, AB and 
BA are not the same thing. In fact, it does not even need to be true that 
AB and BA are both defined. Thus, as a rule, AB ^ BA. In the special 
case where AB = BA, the matrices are said to commute. 


Examples: 


2 l’ 


2 


y 


; b = 


; AB = 


.3 1. 


.1. 


_7_ 


However, BA is not defined since the number of columns in B is not the 
same as the number of rows in A. 


(2) A 


Y 


Y 


3 

4 

1 

5" 

2 

, B = (3,4, 1,5); AB = 

2 

(3, 4, 1, 5) = 

6 

8 

2 

10 

0 


0 


0 

0 

0 

0 

_ 1 _ 


.1. 


.3 

4 

1 

5_ 


BA = (3, 4, 1, 5) 


1 

2 

0 

. 1 . 


(3 + 8 + 5) = (16). 


In this example, both AB and BA are defined, but the products are com¬ 
pletely different. 
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■AV. 

■j 

=m 


1 B 1 = 




B 

! 


A 


C 

l 



Figure 3-1 


Multiplication does not have to be commutative even for two n X n 
matrices where both AB and BA are defined and are n X n matrices. 


Example: 



A = 

Vi 


AB = 

:l: :H;:] 

and 

BA = 

:] 

AB ^ BA. 


Although matrix multiplication is not in general commutative, the 
associative and distributive laws do hold (when the appropriate opera¬ 
tions are defined): 

(AB)C = A(BC) = ABC (associative law) (3--23) 

A(B -p C) = AB + AC (distributive law). (3-24) 

To prove the associative law, we only need to note that 

a ilfikr\ Crj — dik bkrC r j\ == ^ dikbkrCrj) (3—25) 

' \ * / * \r ) k,r 

since, as discussed before, summation signs are interchangeable. Similarly, 
for the distributive law, 

yi dik(bkj “h cicj) — y i aikbkj 4 “ y i && ckj. (3—26) 

k k k 

According to the associative law, ABC is unique; we can compute this 
product by either computing AB and postmultiplying by C, or by com¬ 
puting BC and premultiplying by A. 
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Examples: 

(1) A = (1,2), B = 


3 4 
L2 1J 


C = 


3 0 2 

L5 1 0J 


AB = (1, 2) 
(AB)C = (7, 6) 


3 4 

2 1J 

3 0 2 
.5 1 0 


= (7, 6), 

= (51, 6, 14), 


BC = 

'3 4" 

“3 

0 

2 

— 

’29 4 6 


.2 1. 

_5 

1 

0 . 


.11 1 4. 

A(BC) = 

(1,2) 

29 

4 

6 

= 

(51, 6, 14). 



.11 

1 

4_ 

r 



( 2 ) 


A = (1, 2), B = 


3 4 
0 5J 


4; a* 


B + C = [ 7 6 1, 

A(B + C) = (1, 2) 7 6 

.1 12. 

.1 12. 

AB = (1, 2) 

3 4" 

= 0, 14), 


.0 5. 


AC = (1, 2) 

4 2" 

.1 7. 

= (6, 16), 


= (9, 30), 


AB + AC = (9, 30). 


3-5 Vectors and matrices. The reader has no doubt noticed a pro¬ 
nounced similarity between matrices of one row or column and vectors. 
Indeed, even the notation is the same. We shall now point out the com¬ 
plete equivalence between vectors and matrices having a single row or 
column. First, if one examines the definitions of equality, addition, and 
multiplication by a scalar for both vectors and matrices, it will be ob¬ 
served that they are equivalent when a matrix has a single row or column. 
Furthermore, the scalar product of a row vector and a column vector 
corresponds precisely to our definition of matrix multiplication, when 
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the premultiplier is a row matrix, and the postmultiplier a column matrix.* 
We shall see later that the notation indicating the scalar product of two 
column vectors or two row vectors will again be identical with the appro¬ 
priate matrix notation. Hence, there is a complete equivalence between 
matrices of one row or one column and vectors. We shall not distinguish 
between them. Consequently, we shall continue to use lower case bold¬ 
face letters for vectors or, equivalently, for matrices of one row or column 
rather than the upper case boldface type denoting matrices in general. 
This distinction makes it easy to differentiate between vectors and 
matrices having more than a single row and a single column. Following 
this convention, we shall henceforth write a system of simultaneous linear 
equations as Ax = d and abstain from the notation AX = D used in 
Section 3-3. 

It has already been suggested that the element of the product AB 
is found by forming the scalar product of the ith row of matrix A with 
the jth column of B. On many occasions, it is convenient to represent the 
rows or columns of a matrix as vectors. Suppose that A is an m X n 
matrix. If we define the column vectors as 

&j = \ci \/, • • ., d m j\ } (3—27) 

then A can be written as a row of column vectors, 

A = (&!>*'•; &n). (3—28) 

Similarly, if we define the row vectors as 

a 1 = (an,. .., din), (3-29) 

then A can be written as a column of row vectors, 

A = [a 1 ,..., a"]. (3-30) 

(To refresh the memory: Square brackets mean that a column is printed 
as a row.) 

Let us consider the product AB where A is m X n and B is n X r. 

Represent A as a column of row vectors and B as a row of column vectors, 

that is, 

A = [a 1 ,..., a m ], B = (b„...,b r ). 


* The careful reader will note that the scalar product of two vectors is a 
scalar, while the matrix product of a row and column is a matrix having one 
element. There is a complete equivalence between numbers and 1X1 matrices. 
The details of demonstrating this equivalence are left to Problem 3-56. 
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Thus 

AB = [a\...,a m ](b„...,b r ) 


Here, we can see clearly that each element of the product matrix is a 
scalar product of a row from A and a column from B. As noted above, 
matrices will be frequently represented as a row of column vectors or as 
a column of row vectors. Column vectors will always be denoted by sub¬ 
scripts (ay), rows by superscripts (a 1 ). 

Example: Let A be an arbitrary m X n matrix. Then 

Aey = ay. (3-32) 

This is easily seen by direct multiplication; it can also be shown to be true 
by writing 

Aey = [a ey, . .., a ey] == [uiy, • . ■, a, m j] = ay. 

3-6 Identity, scalar, diagonal, and null matrices. The number one in 
the real number system has the property that for any other number a, 
la = al = a; also 1(1) = 1. A matrix with similar properties is called 
the identity or unit matrix. 

Identity matrix: The identity matrix of order n, written I or I n , is a 
square matrix having ones along the main diagonal (the diagonal running 
from upper left to lower right) and zeros elsewhere. 



1 0 0---0 

0 1 0---0 

0 0 1-0 

.0 0 0 • • • 1 . 


(3-33) 


If we write I — ||5yy||, then 

«,•/ = I 1 ’ \= (3-34) 

l0, i j. 

The symbol defined by (3-34), is called the Kronecker delta. The 
symbol will always refer to the Kronecker delta unless otherwise specified. 
It is tacitly assumed that in (3-34) the indices i, j run from 1 to n. 
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Occasionally, we shall also find it convenient to write an identity matrix 
as a row of column vectors. The column vectors are the unit vectors e*. 
Thus 

^ = (®1> • • • f ©n)* (3—35) 

Frequently, we must deal with several identity matrices of different sizes. 
Differentiation between such matrices is facilitated by writing I n , where 
the subscript indicates the size of I. 

Example: The identity matrix of order 2 is 



and of order 3, 

1 0 
0 1 
0 0 

If A is a square matrix of order n and I the identity matrix of order n, 
I commutes with A: 

IA = AI = A. (3-36) 

To prove this, write B = IA, and note that by (3-34) 

n 

bij = ^ ^ ftikQ'kj — O'ij- 
k= 1 

Similarly, AI = A. If we take A — I, 

n = i. 

It seems natural to write II as I 2 . Thus 

I 2 = I and hence I 3 — I, etc. (3-37) 

If A is not square, then our result cannot be true, since either IA or AI 
will not be defined. However, if A is m X n, then I W A = A and AI n = A. 
Hence, we can always replace any m X n matrix A by I W A or AI n , without 
changing the expression. This substitution is frequently useful. 

Let us consider again a square matrix of order n. Then 

AA = A 2 ; AAA = A 3 ; A fc = AA k '\ Jc = 4, 5, ..., 

are also square matrices of order n. By analogy to the fcth-degree poly¬ 
nomial 
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X*x fc + + • • • + Xo 

in the real number x, we can construct the matrix polynomial 

X*A* + Xjt-iA* -1 + • • • + X 0 I, (3-38) 

where the X t - are, of course, scalars. It is important to note that Xo must 
multiply I, not the number 1. The matrix polynomial is an n X n matrix. 

Scalar matrix: For any scalar X, the square matrix 

s = ||X 5,'y|| = XI (3-39) 

is called a scalar matrix. 

Problem 3-55 will ask you to show that there is an exact equivalence 
between the scalar matrices XI and the real numbers X. 

Diagonal matrix: A square matrix 

D = ||X< in || (3-40) 

is called a diagonal matrix. 

Note that the X; may vary with i. 

Examples: 



is a scalar matrix. 


( 2 ) 


2 0 0 
0 1 0 


is a diagonal matrix. 


L0 0 3J 

In the real number system, zero has the property aO — 0a — 0 and 
a 0 — a. A matrix with similar properties is called the null matrix or 

zero matrix. 

Null matrix: A matrix whose elements are all zero is called a null or 
zero matrix and is denoted by 0. 

A null matrix does not need to be square. 


The following are null matrices: 





0 

o 

II 

o 1 

o 

II 

o 1 

o 

o 

0 = 

0 

Lo oj Lo o oj 


,0. 


I 
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When the operations are defined, we have 


A + 0 = A= 0 + A, (3-41) 

A - A = 0 , (3-42) 

AO = 0 , ( 3 - 43 ) 

0 A = 0 . ( 3 - 44 ) 

When A, 0 are square matrices of order n, 

AO == 0 A = 0 . (3-45) 

If a, b are real numbers, then ab = 0 means that a = 0, or b = 0 , 
or a and 6 = 0 . The matrix equation 


AB = 0, (3-46) 

however, does not imply that A = 0 or B = 0 . It is easy to find non-null 
matrices A, B whose product AB = 0 . This is another case where matrix 
operations do not follow the behavior of real numbers. 

Example: 


\ _ 

4~ 


0 


1 

0 

0" 

Lo 

0_ 

L-i 

0. 


,0 

0_ 


This product of two matrices is a null matrix, although neither of the 
factors is a null matrix. 


3-7 The transpose. Sometimes it is of interest to interchange the rows 
and columns of a matrix. This new form is called the transpose of the 
original matrix. 

Transpose: The transpose of a matrix A = ||a» ; || is a matrix formed 
from A by interchanging rows and columns such that row i of A becomes 
column i of the transposed matrix . The transpose is denoted by A' and 


A' = ||ay,|| 

If a[j is the ijth. element of A', then 
Examples: 


II 

< 

H 

- 1 

1—1 

CO 

_1 

> 

II 

"l 2 


L2 5J 


.3 5_ 


It will be observed that if A is m X 


when A = ||a,y||. (3-47) 

aij = fl/t*. 


( 2 ) A = 

1 3 4 

; A' = 

'l 0 " 

3 1 


.0 1 0 . 


4 0 


n, A' is n X m. 
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If the sum C = A + B is defined, then 

C' = A' + B'. (3-48) 

We prove this by noting that 

c'ij = Cji = aji + bji = a'ij + Ki¬ 
lt is interesting to consider the transpose of the product of two matrices. 
If AB is defined, then 

(AB)' = B'A', (3-49) 

that is, the transpose of the product is the product of the transposes in reverse 
order. To prove this, let C = AB. Thus, 

c'ij — cji = V ajkbki = Q'kjb'ik = 5^ b'ikCLkj- 
k k * 

Hence we have shown that the ijth element of C' is the ijth element of 
B'A', and (3-49) follows. Suppose A is m X n and B is n X r so that AB 
is defined. Then A' is n X m and B' is r X n. Thus if AB is defined, 
then B'A' is defined also. The same result as in (3-49) holds for any 
finite number of factors, i.e., 

(AxA 2 .. . A n )' = A; .. . A' 2 A' 1: (3-50) 

This is easily proved by induction; the details are left to be worked out 
in Problem 3-25. 


Example: 



It will be noted that 
Also, 
since 


I' = I. 

(A')' = A, 

(a'ij)' = a'ji = a{j. 


(3-51) 

(3-52) 


It should now be clear why we used a'b for the scalar product of two 
column vectors and ab' for the scalar product of two row vectors. The 
transpose symbol was introduced so that the product, expressed in matrix 
terms, would be defined. 
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3-8 Symmetric and skew-symmetric matrices. 

Symmetric matrix: A symmetric matrix is a matrix A far which 

A = A'. ( 3 - 53 ) 

Clearly, a symmetric matrix must be square; it is symmetric about the 
main diagonal, that is, a reflection in the main diagonal leaves the matrix 
unchanged. An nth-order symmetric matrix does not have n 2 arbitrary 
elements since a*-/ = both below and above the main diagonal. The 
number of elements above the main diagonal is (n 2 — n)/2. The diagonal 
elements are also arbitrary. Thus the total number of arbitrary elements 
in an nth-order symmetric matrix is 

n 2 —* n , n(n + 1 ) 

2 + U “ 2 

Example: The following is a symmetric matrix: 

2 0 7~ 

0 3 5* 

J7 5 1 . 

Skew-symmetric matrix: A skew-symmetric matrix is a matrix A for 

which 

A = -A'. (3-54) 

A skew-symmetric matrix is also a square matrix, and 

a%j = aj{. 


Hence, the diagonal elements are zero, an = 0 , and the number of arbi¬ 
trary elements in an nth-order skew-symmetric matrix is 

n(n — 1 ) 

2 

Example: The following matrix is skew-symmetric: 

0 1 2 

—1 0 —3 • 

.—2 3 0_ 
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Any square matrix can be written as the sum of a symmetric and a 
skew-symmetric matrix: 



(3-55) 

(3-56) 

(3-57) 


then A, is symmetric, and A a is skew-symmetric. We have thus expressed 
the square matrix A as the sum of a symmetric and a skew-symmetric 
matrix. 


Example: 


A = 


2 

-4 

i 


2 

-* 

¥ 

3 

1 

9 

A +A' 

2 = 


i 

¥ 

_8 

6 

9. 


_ ¥ 

¥ 

9 . 


A - A' 


0 \ 

i . o | 

Li -* o. 


A + A' A - A' 

O ' O 


3-9 Partitioning of matrices. It is often necessary to study some subset 
of elements in a matrix which form a so-called submatrix. 

Submatrix: If we cross out all but k rows and s columns of an m X n 
matrix A, the resulting k X s matrix is called a submatrix of A. 

Example: If we cross out rows 1, 3 and columns 2, 6 of the 4X6 matrix 
A = ||a„||, we are left with the 2X4 submatrix 

[ a 2 l «23 «24 «2sl 

a 4 i a 43 a 44 a 4 5j 

For a number of reasons we wish to introduce here the notion of partition¬ 
ing matrices into submatrices. Some of these reasons are: 

(1) The partitioning may simplify the writing or printing of A. 

(2) It exhibits some particular structure of A which is of interest. 

(3) It simplifies computation. 
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We have already introduced one special case of partitioning when we 
wrote a matrix as a column of row vectors or a row of column vectors. 
We have seen that an m X n matrix B could be written 

B = IIM = 0>1, . . . , bn), by = [b u , . .., b m j\; 

B has been partitioned into n submatrices, the column vectors by. Suppose 
that we have an r X m matrix A; the product C = AB is then defined. 
The matrix C will be r X n and can be written 


However, 

or 


C — (Cl, . . . , c n ), Cy — [cjy, . . . , Crj]. 

Cij = ^ ^ O'tkbkj) 
k 

Cy = [%2cilkbkji ^i^2kbkjj • • • , 


This can be expressed in matrix form as follows: 


Cy — Aby. 


(3-58) 


Hence, we can write the product AB as 


C = AB = (Abj, Ab 2 ,. . ., Ab n ). (3-59) 


Each column of C is computed by multiplying the corresponding column 
of B by A. 

We shall now consider partitioning from a more general point of view: 
We have the matrix 


A 

<*li 

<*21 

<*12 

<*22 

<*13 

<*23 

<*14 

<*24 

A = 

<*31 

<*32 

<*33 

<*34 


<*41 

<*42 

<*43 

<*44 


<*51 

<*52 

<*53 

<*54 


(3-60) 


Imagine A to be divided up by dotted lines as shown. Now, if we write 


An = 

A 2 i — 


then A can be written 


r<*n 

<*12 

<*13 


L a21 

<*22 

<*23_ 


<*31 

<*32 

<*33 


<*41 

<*42 

<*43 

; 

<*51 

<*52 

<*53 



A = 

An 

A i2 



_A 2 i 

a 22 


M. 

L a24 J 

<*34 

«44 > 

<*54 


(3-61) 


(3-62) 
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Matrix A has been partitioned into four submatrices, the AThe sub- 
matrices will be denoted by upper-case boldface roman letters; some¬ 
times, if they are vectors, by lower-case boldface type. 

If partitioning is to be of any use for computations, we must be able 
to perform the usual operations of addition and multiplication in parti¬ 
tioned form. Clearly, if 


An 

A 12 

f 

B = [ B " 

B 12 

) 

(3-63) 

_A 2 i 

A 22 . 

B21 

B 2 2_ 



then 

A + B = [ An+Bl1 Al2 + Bl2 l, (3-64) 

[A2I + B2I A 2 2 + B22J 

provided that for every Athe corresponding B,y has the same number 
of rows and the same number of columns as A Hence the rule for addition 
of partitioned matrices is the same as the rule for addition of ordinary matrices 
if the submatrices are conformable for addition. In other words, the matrices 
can be added “by blocks. ” However, A, B must be partitioned “in the same 
way” if (3-64) is to hold. 

Next, we shall consider multiplication of partitioned matrices. We 
would like the formula for block multiplication to follow the pattern of 
the usual formula 

C f , = (3-65) 

k 

where the elements in (3-19) are now submatrices in (3-65). We note at 
once that this rule cannot hold unless A i k and B k j are conformable for 
multiplication, that is, the number of columns in submatrix A,* must be 
the same as the number of rows in submatrix B k j. This will be true only 
if the columns of A are partitioned in the same way as the rows of B. 

We wish to show that multiplication by blocks does follow the ordinary 
rules for multiplication: 



A = 

An A12 

, B = 

Bn B 12 ' 

J 




A21 A22. 


B21 B22. 



C = AB = 

AnBii + A12B21 A11B12 + A12B22 

—^ 

Cu C12 


A21B11 + A22B2I A21B12 + A 22 & 22 _ 


C21 C 22 _ 


(3-66) 

provided that the submatrices A ik , B*, are conformable for multiplica¬ 
tion. As usual, if is m X n and B k j is n X r, then C»y ism X r. 
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It is not hard to see that multiplication by blocks is correct. Suppose, 
for example, that we have a 3 X 3 matrix A and partition it as 


A - 

dn 

d21 

d\2 

d2i 

d\3 

<*23 

— 

An 

Ai 2 


d3i 

d32 

<*33 


_A 2 i 

a 22 


(3-67) 


and a 3 X 2 matrix B which we partition as 



'bn 

&12 

B = 

b 2 i 

&22 


b& 1 

b&2_ 


Then any element of C = ||c,y|| = AB is 



(3-68) 


Cij — + <*i2&2j) + <**3&3/- 


(3-69) 


The quantity dnb\j + a l2 & 2 y is simply the ijth element of A n Bn if 
i < 2, and the ijtn element of A 2 iB n if i = 3. Similarly a^b^j is the 
ijth. element of A 12 B 2 x if i < 2, and of A 22 B 2 i if i — 3. Thus C can be 
written 


c = 

Cu 

AuB u 

+ Ai 2 B 2 i 


Ci2_ 

A 2 iBn 

+ a 22 b 21 


(3-70) 


It is important to note that in partitioning matrices for multiplication, 
the partition line between rows s and s + 1 is drawn all the way across 
the matrix so that there is a partition between rows s and s + 1 in every 
column. These partitions are never jagged lines like that shown in the 
following matrix: 

a 12 <*13 <*14 

<*21 j <*22 <*23 | <*24 

<*31 a 3 2 O 33 a 34 

<*41 «42 ^43 «44 


Similarly, when we start drawing a partition line between column r and 
r + 1, we draw it all the way down the matrix. Only by drawing parti¬ 
tions “all the way across” and “all the way down” can we be sure that 
the expression ^A^B^y i s defined in terms of addition and multiplication. 
For example, the number of rows in each A ik is the same for each k and a 
given i only because every column is partitioned between rows s and s + 1. 

Using the concept of partitioning, we can write the system of equations 
Ax = b in another useful way. If we write A as a row of column vectors 
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(a i; . . ., a n ), then 


Ax = a^i + • • • + a n x n = xi&i + • • • + x n a n = b. (3-71) 


Similarly, if we write A as a column of row vectors 
then, 

[a*x,. .., a w x] = b, or a l x = b if i = 1,. . ., m. (3-72) 
Example: We are given the matrix A and partition it as follows: 



3 

2 







An 

Ai 2 

I w 

1 

5 

0 

— 

A 2 i 

A 22 _ 

u 

1 

7. 



’1 3 " 

’ a 12 = 

"2 

> A 21 = (4, 1), 

A 22 — (7) 

.2 5 . 


0_ 




We also have matrix B; this will be partitioned so that block multiplica¬ 
tion of AB is defined, that is, 


"0 

1 

2" 




2 

4 

5 


"B n 

Bi 2 

_6 

0 

1 . 


B 2 1 

b 22 . 


Bn 



B i2 


1 2 

f 

A 5. 


B 2 i = (6), B 2 2 = (0, 1). 


First, we shall compute C = AB by multiplying the matrices directly 
without partitioning. This gives 



1 3 2 


0 1 2" 


18 13 19 

c = 

2 5 0 


2 4 5 

= 

10 22 29 


4 1 7. 


_6 0 1_ 


.44 8 20. 


If we use block multiplication, C should be 

AiiBn + A 12 B 21 AnB 12 + A 12 B 22 
A 21 B 11 + A 22 B 21 A 2 iB 12 + A 22 B 22 . 
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However, 

AnBu + A12B21 = 



AnB 12 + A! 2 B 2 2 





A21B11 + A22B21 



+ (7) (6) = (44), 


A 21 B 12 + A 22 B 22 = (4, 1) 1 2 + (7)(0, 1) = (8, 13) + (0, 7) 

.4 5. 

= ( 8 , 20 ). 

Thus, by block multiplication, 



The same result was obtained by direct multiplication. 

We have seen that multiplication of matrices by blocks requires that 
the partitioning of the columns in the premultiplier be tie same as the 
partitioning of the rows in the postmultiplier. It makes no difference at 
all how the rows in the premultiplier or the columns in the postmultiplier 
are partitioned. If we are considering the product AB, then the number 
of columns in A ik must be the same as the number of rows in B k j. 

The results of this section have shown that when matrices are appro¬ 
priately partitioned for addition or multiplication, the submatrices be¬ 
have as if they were ordinary elements of the matrix. It should be em¬ 
phasized, however, that submatrices behave like ordinary elements only 
when the matrices have been partitioned properly so that the operations 
to be performed are defined. 

We shall now interrupt our study of matrices for a while in order to 
develop some aspects of the theory of determinants which will be needed 
in our further investigation of the properties of matrices. 
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3-10 Basic notion of a determinant. The concept of a second- or 
third-order determinant should be familiar to everyone. A second-order 
determinant is defined to be 

| au ai2 | = a n a 22 - a 12 a 21 . (3-73) 

I 021 <*22l 

It is a number computed from the four elements of the square array 
according to (3-73). Determinants occur naturally in the solution of 
simultaneous linear equations. In elementary algebra, we learn that the 
solution to two simultaneous linear equations in two unknowns, 


+ 012^2 — b \ 7 
021*^1 + 022^2 “ 5 2 j 

can be written (when the denominator does not vanish): 


bi 

012 


011 

bi 

b 2 

022 


021 

b 2 

a ii 

012 

■ > #2 — " 

011 

012 

021 

022 


021 

022 


(3-74) 


(3-75) 


A third-order determinant is defined to be 


011 012 013 

a 2 l 022 023 — 011022033 — 012021 a 33 + 012023 a 31 
031 032 033 — 013 a 22031 + 013 a 21032 011023 a 32* 


(3-76) 


It is a number computed from the elements of a 3 X 3 array according 
to formula (3-76). Third-order determinants arise in solving three simul¬ 
taneous linear equations in three unknowns. 

It is of interest to generalize the notion of a determinant to n X n 
arrays. In the 2 X 2 and 3X3 cases, it will be observed that the de¬ 
terminant is the sum of terms such that each term contains one and 
only one element from each row and each column in the square array. 
Furthermore, the number of elements in each term is the same as the 
number of rows in the array, that is, no element is repeated. We notice 
also some alternation in the sign of the terms. We expect that a more 
general definition of a determinant will include these features. 


3-11 General definition of a determinant. First, we wish to develop 
some properties of permutations of numbers. A set of integers 1, . . . , w 
are in "natural order” when they appear in the order 1, 2, -3, . . . , n. If 
two integers are out of natural order in a set of n integers, then a larger 
integer will precede a smaller one. For example, the natural order of the 
first five integers, beginning with 1, is (1, 2, 3, 4, 5). When the integers 2 
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and 4 are interchanged, we obtain (1, 4, 3, 2, 5). The set is now out of 
natural order because 4 precedes 3, 2, and 3 precedes 2. Any rearrange¬ 
ment of the natural order of n integers is called a permutation of these 
integers. The interchange of two integers, such as 2 and 4 in the above 
example, is called a transposition . The number of inversions in a permuta¬ 
tion of n integers is the number of pairs of elements (not necessarily adja¬ 
cent). in which a larger integer precedes a smaller one. In our example, 
there are three inversions: (4, 3), (4, 2), and (3, 2). It should be noted 
that the number of inversions in any permutation of n integers from their 
natural order is unique and can be counted directly and systematically. 
A permutation is even when the number of inversions is even, and odd 
when the number of inversions is odd. 

Consider now the nth-order matrix 


dll • * 

din 

_ dn \ 

' * d nn 


(3-77) 


If one element is selected from each row and column of A, n elements are 
obtained. The product of these elements can be written 


a ud 2 ja 3 k • • • a nT) (3-78) 

where the set of second subscripts (i, j,k, ... , r) is a permutation of the 
set of integers ( 1 , 2 , . . . , n). To determine the number of possible differ¬ 
ent products of this type, we note that there are n choices for i; given t, 
there are n — 1 choices for j, and given j, there are n — 2 choices for k> 
etc. Thus in all there are 

n(n — l)(n — 2) ... 1 = n! (3-79) 

products of type (3-78) which can be formed from the n 2 elements of A. 
The symbol n! is read n-factorial and is defined by (3-79). All possible 
products (3-78) are obtained by using all possible permutations of the 
integers 1 , 2 , . . . , n as the set of second subscripts on the elements. 

Example: Let us find all the different products of three elements in 
a 3 X 3 matrix A such that any one product contains one and only one 
element from each row and column of A. These products can be obtained 
from aua 2 jask by substituting all permutations of the set of numbers 
1, 2, 3 for the set of subscripts i, j, k. 

We obtain ai 2 a 2 ia 3 3 , ai 3 a 22 a 31 , a u a 23 d 32 which represent odd permu¬ 
tations of the second subscripts from the natural order 1 , 2 , 3 , since they 
have an odd number of inversions (1 for the first, 3 for the second, 1 for 
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the third). Then there are the terms ai 3 a 2 ia 32 , ai2«23«3i which are 
even permutations. Finally, we have 011022^33 which is an even permu¬ 
tation (no inversions); this is the identity permutation—the subscripts 
are in their natural order. 

Hence, six terms can be obtained from the nine elements a<y. It will 
also be noted that the above terms are precisely those which appeared 
in the definition of a third-order determinant. If these terms are added 
together, with a plus sign attached to terms representing even permuta¬ 
tions, and a minus sign to those representing odd permutations, we have 
arrived at the expansion of a third-order determinant. This is the key 
to the general definition of a determinant. 

Determinant: The determinant of an nth-order matrix A — , 

written |A|, is defined to he the number computed from the following sum 
involving the n 2 elements in A: 

|A| = Z(±)«U«2;...anr, (3-80) 

the sum being taken over all permutations of the second subscripts. A term 
is assigned a plus sign if (i, j, ... , r) is an even permutation of 
(1, 2, . . . , n), and a minus sign if it is an odd permutation. 

We shall find it convenient to refer to the determinant of an nth-order 
matrix as an nth-order ornXn determinant. If we examine our defini¬ 
tions of second- and third-order determinants, we see that they follow 
directly from the above definition. The third-order case was worked out in 
the preceding example. We have shown that there are n! terms in the 
summation. One and only one element from each row and column of A 
appears in every term of the summation. 

The definition of a determinant implies that only square matrices have 
determinants associated with them. Often we shall write |A| as 


an 

<*12 * 

’ « 

«21 

a 22 • 

’ «2n 

<*nl 

<*n2 * ' 

' * «nn 


(3-81) 


with straight lines denoting the determinant (instead of brackets). The 
notational form of (3-80) was used by early writers, such as Jacobi and 
Cauchy. Cayley introduced the notation of (3-81). Originally, deter¬ 
minants were called eliminants. This name indicated more clearly that 
they arose on elimination of variables in solving simultaneous linear 
equations. Perhaps it should be emphasized again that, while a matrix 
has no numerical value y a determinant is a number. 
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3-12 Some properties of determinants. Before turning to the properties 
of determinants, we shall find it helpful to establish the fact that a single 
transposition of two elements in any permutation will change an odd per¬ 
mutation to an even one, and vice versa. To prove this, consider the set 
of numbers aj, a 2 , . . . , a n representing some permutation of 1, 2, . . . , n. 
Suppose now that a* and ay are interchanged ( j > i). In doing this, ay 
passes over ay_ x , ay_ 2 > • • • , or j — i subscripts. On the other hand 
ai passes over a t+1 , . . . , ay_i, or j — i — 1 subscripts. Wherever we 
pass over any given index, we either introduce an inversion or remove 
one. The total number of inversion changes on a single transposition is 
then 2 (j — i) — 1 which is always an odd number. Hence, if the original 
permutation was odd, the new permutation is even, and if the original 
was even, the new one is odd. 

Example: Consider (1, 3, 2, 6 , 4, 5). This permutation is odd and the 
inversions are (3, 2), ( 6 , 4), ( 6 , 5). Suppose that we interchange 5 and 1 
to yield (5, 3, 2, 6 , 4, 1). This permutation is even, and the inversions 
are: (5, 3), (5, 2), (5, 4), (5, 1), (3, 2), (3, 1), (2, 1), ( 6 , 4), ( 6 , 1), (4, 1). 


Using the result just obtained, we can immediately show that an in¬ 
terchange of two columns in an nth-order matrix A changes the sign of |A|. 
The interchange of two columns interchanges two second subscripts in 
each term of (3-80) and changes the sign of each term since in the new 
determinant an originally odd permutation becomes an even permutation, 
and vice versa. 


Example: 


|A| 


1 2 
3 4 


- 2 . 


If the two columns are interchanged, the new determinant is 


|B| 


2 1 
4 3 


= 2 , 


and the sign of the determinant is changed. 

Next we would like to show that for a square matrix A, |A| == |A'|; 
that is, the determinant of the transposed matrix is the same as the determinant 
of the matrix itself . If B = A', then a typical term in the expansion of 
|B| is 

* * * &rn* (3 82) 

Hence, for every term in |B| there is precisely the same term in |A|. It 
only remains to be shown that the signs are the same. Clearly, the number 
of inversions of the second subscripts of the left-hand side of (3-82) is 
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the same as the number of inversions of the first subscripts on the right- 
hand side. If the elements are rearranged on the right-hand side so that 
the first subscripts are in their natural order, then, by symmetry, the 
second subscripts have the same number of inversions that the first sub¬ 
scripts had at the outset. Thus the signs of the corresponding terms are 
the same, and 

| A'| = |A|. (3-83) 

Example: 

4 3 
1 2 

Equation (3-83) demonstrates clearly that, since an interchange of two 
columns in A changes the sign of ]A|, an interchange of two rows in A 
must also change the sign of |A|. 

The result of the interchange of rows and columns points to another 
useful property. If a matrix has two rows or two columns which are 
identical, then the value of its determinant is zero. Suppose that aa = 
ay k (all k) or a ik — a*y (all i); if we interchange the two rows or columns 
which are the same, A remains unchanged. However, the sign of the de¬ 
terminant changes: 

|A| = — |A| or |A| + |A| = 0. 

The only real number for which this equation holds is |A| = 0. 

Example: 

1 1 = 0 . 

2 2 

Multiplication of every element of the ith row or the jth column of A 
by a number X multiplies |A| by X. We only have to keep in mind that an 
element from each row and column appears once and only once in each 
term of (3-80), so that if, for example, a ik is replaced by \aa for a given i 
and all k f then the whole determinant is multiplied by X. Consequently, 
if every element of an nth-order matrix A is multiplied by X, then 

|XA| - X n |A|. (3-84) 

It is important to note that it is not true that |XA| = X|A|. If X — — 1, 
then 

I—A| = (—l) n |A|. (3-85) 

The determinant of a square matrix having a row or column whose ele¬ 
ments are all zero clearly is zero. This follows directly from both the 


|A| = 4 1 = 5, | A'| = 

3 2 
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definition and the result of (3-84) if X, the factor multiplying a row or 
column, is zero. 

Example: 


|A| 


2 3 
1 4 


5, 


|B| = 


4 3 
2 4 


= 10 , 


|B| = 2 |A|; 


|2A| = 


4 

2 



| 2 A| = 4|A|. 


3-13 Expansion by cofactors. It is not easy to evaluate numerically a 
determinant by the definition (3-80) if n is large. The task of finding all the 
permutations and assigning the proper sign is a difficult one. Hence, we 
shall develop another method of evaluating a determinant, which is of 
considerable theoretical importance and simplifies the procedure. How¬ 
ever, it is by no means efficient. A more adequate numerical procedure 
is developed in Problem 4 - 21 . 

In (3-80) an element from row i of A appears in every term of the sum¬ 
mation. If the elements are factored out, the determinant of A can 
be written* 

n 

|A| = (3-86) 

i= 1 

where i can be any row. Each is the sum of (n — 1 )! terms involving 
the products of n — 1 elements from A, the sum being taken over all 
possible permutations of the second subscripts. In addition, there is a 
plus or minus sign attached to each term, depending on whether a*y times 
that term yields an even or odd permutation of ( 1 , 2 , . . . , n) as the set of 
second subscripts (q, k, . . . , r). In there are no elements from row i 
or column j of A since these two subscripts appear in a»y, and only one 
element from each row and column of A can occur in any term. 

Example: In expanding a third-order determinant in the form of (3-86), 
we see from (3-76) that if i = 1 (row 1 ), 

An — a 22&33 ®23 a 32, ^12 = —(&21&33 — ^23^3l), 

A 13 — 0 , 2 1&32 — &22#31 J 


* It is not easy to follow the material in this section in the abstract. We 
suggest therefore that the reader simultaneously work through an example 
such as Problem 3-18. Note that in discussing the second subscripts, we always 
assume that the first subscripts are in their natural order. Also, when the second 
subscript j is fixed to position i, then the question whether the permutation of 
the second subscripts is odd or even depends only on the positions of the n — 1 
second subscripts other than j. 
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and |AJ — ctnAn + 012^.12 + <* 13 ^ 13 . 

If i = 3 (row 3), then 


^31 — « 12<^23 ~ <* 13 ^ 22 , A32 = — (<*11023 “ « 13 « 2 l ), 


and 


A33 — 01l<*22 — 012<*21> 


|A| — 031^31 + 032 A 32 + ^33^33. 


The description of the Ay indicates that Ay can be considered as a 
determinant of order n — 1 . All elements of A appear in Ay except those 
in row i and column j. In fact, except for a possible difference in sign, 
Ay is the determinant of the submatrix formed from A by crossing out 
row i and column j. If the reader examines the Ay in our example, he will 
see that in the 3X3 case this is indeed true. 

Let us now determine the sign that should be assigned to the deter¬ 
minant of the submatrix obtained by crossing out row i and column j 
of A in order to convert it to Ay. We do this most simply by moving row 
i to row 1 by i — 1 interchanges of rows. The other rows, while retaining 
their original order, will be moved down one. Then, by j — 1 inter¬ 
changes, column j will be moved to the position of column 1 . We shall 
call this new matrix B. Then 


|B| = (-l)*' + '- 2 |A| = (— 1 )* +, |A|. (3-87) 

However, the product of the elements on the main diagonal of B have a 
plus sign when |B| is written in the form of (3-80). Now 6 U = ay, and 
the remaining elements on the main diagonal of B are the same as those 
appearing on the main diagonal of the submatrix whose determinant is 
Ay. But, from (3-87), (—I )*'*’’ 7 times this term in the expansion of |A| 
must be positive. Therefore, Ay is (—l) 1 "^ times the determinant of the 
submatrix formed from A by deleting row i and column j. The number 
Ay is called the cofactor of ay. 

Cofactor: The cofactor Ay of the element aij of any square matrix A is 
times the determinant of the submatrix obtained from A by deleting 
row i and column j. 

We have now provided a way to evaluate the Ay in expansion (3-86) 
of |A|. This important method of evaluating determinants is called ex¬ 
pansion by cofactors. Thus, for example, expression (3-86) is said to be 
an expansion by row i of A. To make (3-86) consistent with (3-80) for 
the case of n = 1 , the cofactor of the element in a first-order matrix 
must be defined to be 1 since (3-80) requires that |A| = an for the 
matrix A == (an). 
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The use of the expansion by cofactors [Eq. (3-86)] reduces the prob¬ 
lem of evaluating an nth-order determinant to that of evaluating n deter¬ 
minants of order n — 1 , which are the cofactors Aij when j = 1 , . . . , n. 
Thus, proceeding by steps, we arrive at an evaluation of |A|. Application of 
the cofactor-expansion method to the cofactors reduces the task of evaluat¬ 
ing each cofactor of order n — 1 to that of evaluating n — 1 determi¬ 
nants of order n ~~ 2, etc. The expansion by cofactors is not a very 
efficient numerical procedure for evaluating determinants of high order. 
It is, however, of considerable theoretical value. 

We find it useful to give a name to the fcth-order determinants, 
k — 1, . . . , n — 1, which appear in the step-by-step evaluation of |A| 
by cofactor expansion. They will be referred to as minors of A. More 
precisely: 

Minor of order k : For any m X n matrix A consider the kth-order 
submatrix R obtained by deleting all but some k rows and k columns of A. 
Then |R| is called a kth-order minor of A. 

Note: In this definition A is not required to be a square matrix. 

Instead of writing |A| in the form (3-86), we could have shown equally 
well that 

|A| = Yj (3-88) 

i — 1 

where Aij is again the cofactor of a tJ -. This is called an expansion of |A| 
by column j of A. 

Examples: (1) Let us expand 

<*11 <*12 <*13 

|A] = 021 <*22 <*23 

<*31 <*32 <*33 

in cofactors by the second row. This expansion should read 
|A) = <*21-4-21 + <*22-4.22 + <*23^-23- 

The cofactor Aij is found by crossing out row i and column j and by 
multiplying the resulting determinant by (—l) l+J . Hence 

A 21 = (“I) I _ = <*32<*13 — <*12<*33> 

I <*32 <*331 

a I <*n <*i 31 

A 22 = 1 — <*11<*33 — <*13 1) 

I <*31 <*331 

A / !<*11 <*12! 

A 23 = (—1) L _ — <*12<*31 ” <*11<*32* 

I <*31 <*321 
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Thus |A| — fl2l(^32 a 13 — ^12833) ^22(^11^33 — a 13 a 3l) 

+ ^23(^12^31 — ^11®32)- 

If the definition of a third-order determinant (3-76) is examined, it will be 
seen that the above result is identical with (3-76); hence, the correct 
expansion of |A| has been obtained. 

(2) The method of expansion by cofactors facilitates the evaluation 
of |I n |. We expand by the first row. This gives 

|I«I = (1) |I»—11 • 

Expanding |I n __iI, etc., in the same way, we finally obtain 

I In | = (1)(D ... (1) = 1. 

Thus the determinant of any identity matrix is 1. 

3-14 Additional properties of determinants. Expansion by cofactors 
can be used to prove some additional properties of determinants. Ex¬ 
panding by the first column, we can immediately see that 


(Xian +'X 2 &n + X3C11) 

a 12 • 

• n 


an 

a i2 • 

• Gin 

(Xia 2 i + X 2 6 2 i + X 3 c 2 i) 

a 22 • 

* «2n 

= x x 

a 2 i 

a 22 * 

• a 2n 

(Xia n i + X 2 6 n i + X3C n i) 

a n2 * * 

®n» 


Anl 

«n2 * * 

Ann 



6ll 

ai2 • 

* * P'ln 


Cll 

ai2 * * 

* Gin 

+ x 2 

621 

a 2 2 * 

• • a> 2 n 

+ X3 

C 2 i 

a 22 • • 

• a 2n 


bnl 

a n2 * 

* * ^nn 


C» 1 

a n 2 • * 



since 

£(Xia t i + X 2 &;i + X 3 Cii)Aii = Xi£aiiA;i + X 2 £&*iA t i + X 3 Xc*iAii. 

The same sort of result holds when the ith row or^th column of A is written 
as a sum of terms. 

Example: 


6 2 

= 

(3 + 3) 2 

— 

3 2 

+ 

3 2 

5 1 


(4+1) 1 


4 1 


1 1 


-4 = -5 + 1 = -4. 


6 2 

= 

(4 + 2) (2 + 0) 

— 

4 2 

+ 

2 0 

5 1 


5 1 


5 1 


5 1 


—4 — -6 + 2 = -4. 
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It should be observed that if A, B are nth-order matrices, it is definitely 
not true that |A + B| is the same as |A| + |B| in all cases. 

|A + B| |A| + |B| (in general). (3-90) 

From result (3-89) and the fact that a determinant vanishes if any two 
rows or columns are the same, it is easy to see that adding a multiple of 
row k to row i (i ^ k) or a multiple of column k to column i(i k) of A 
does not change the value of |A|. 

This can be easily proved: On expanding by row or column i we get |A| 
plus a constant times the determinant of a matrix with two identical 
rows or two identical columns. This determinant vanishes, and hence 
|A| is left unchanged. Consequently, if one column (or row) of a square 
matrix A is a linear combination of the other columns (or rows), the value 
of |A| is zero. 


Examples: 

a) I 6 2 I 

(6 + 2 X) 

2 

6 2 +x 

2 

|5 

H 

(5 + X) 

1 

5 1 

1 

( 2 ) |A| 

— 

a\\a\2 • • • a ln 

® 21^22 * * * 02 n 

= 

(011 + 50 nl) * * ' 
021 

“ ( 01 n fi 0 nn) 

* 02 n 



0 nl 0 n 2 * * * 0 nn 


0 nl 

0 nn 


The fact that a determinant |A| vanishes if any two rows or columns 
of A are the same, leads us to a very important result concerning the 
expansion by cofactors. We have shown that 

|A| = Y a H A <i = E (3-91) 

3 i 

Now consider the expression 

Y 0-kjAij (k ^ i). 

3 

We are using the cofactors for row i and the elements of row k. This is 
exactly the expansion of a determinant where rows i and k are the same; 
hence, the above expression vanishes, that is, 

^ ^ Q'kj A jj = ^ QjiAjk = 0, (i k). (3—92) 

3 3 

Equations (3-91) and (3-92) can be combined: 

^ , <lijAkj — ^ ^ djiA-jk = |A| 8kiy 

3 3 


(3-93) 
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where is the Kronecker delta defined earlier. We have shown that an 
expansion by a row or column i in terms of the cofactors of row or column k 
vanishes when i ^ k, and is equal to |A| when i = k. This result will be 
used a number of times in the future. 

Example: 


| A | = 

an 

a 2 i 

<* 3i 

ai2 

a 22 

<*32 

<*13 

<*23 

<*33 

; An — 

I <*22 

1 <*32 

<* 231 

<*331 

A — l 0,21 

a 12 — —- 

1 <* 31 

<* 231 
<*331 





^13 = 

| <*21 

1 <* 31 

<*221 
<*321 




We expand by row 2, using the cofactors of row 1, and obtain, as expected: 


<*2l(<*22<*33 — <*23<*32) — <*22(<*21<*33 — <*23<*3l) + <*23(<*21<*32 — <*22<*3l) = 0. 

It will be noted that (3-93) looks very much like a matrix product. If 
we set afj = Aji, then (3-93) becomes 

^ ] Q’ijQ'-jk == ^ i Q'itjaji — |A| 5k *• (3—94) 



^11 

A 2 i • ■ 

* An! 


> 

+ 

II 

"o’ 

II 

A 12 

A 2 2 * ' 

* A n 2 

. (3-95) 


A in 

A 2n ’ ’ 

• • A nn 



then Eq. (3-94) can be written in matrix form: 

AA+ = A+A = |A|I». (3-96) 

The matrix A -1 " is called the adjoint of matrix A. In fact, A + is the trans¬ 
pose of the matrix obtained from A by replacing each element a,, by its 
cofactor Aij. 


*3-15 Laplace expansion. In this section, we shall consider another, 
more general, technique of expanding a determinant known as the Laplace 
expansion method, which includes, as a special case, the expansion by co¬ 
factors. Instead of expanding by a single row or column, we now expand 
by several rows or columns. The determinant |A| is written as the sum 
of terms, each of which is the product of two determinants. 


* Starred sections can be omitted without loss of continuity. 
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We begin by considering the first m rows (m < n) and columns of A. 
If we collect the terms containing ana 2 2 • . . a mm in the expansion of |A|, 
and factor out this quantity, we are left with 

+ * * * &nrj (3-97) 


the sum being taken over all permutations of the second subscripts. This 
sum, however, represents the determinant of the submatrix formed from A 
by crossing out the first m rows and columns. Note that the sum (3-97) 
will be obtained also if we collect all terms containing ai u a 2v - . . a m w 
[{u y v y . . . , w) represents a permutation of (1, 2, ... , m)] and factor out 
a\ u a 2v - • • Umwj- The sign will alternate, depending on whether the permu¬ 
tation (Uy Vy . . . , w) is even or odd. Thus, the following terms appear 
in the expansion of |A|: 

(i l)(lxw^2v ’ * ’ l)®w+l iO / m+2j * * ’ (3~98) 

where 


2 (± 1 )« 1 w « 2 v ®mw 


is the determinant of the submatrix formed from the first m rows and 
columns of A. Expression (3-98) is thus the product of the determinant 
of the submatrix formed from the first m rows and columns and the 
determinant of the submatrix formed by crossing out the first m rows and 
columns. We have the correct sign, since in the expansion of |A| the term 
d 11&2 2 ®nn has a plus sign. 

Next, we shall consider the m X m submatrix formed from rows i \, 
z 2 , . . . , i m and columns ji, j 2 , • • - , im* Except for the sign, the expansion 
of |A| will contain the product of the determinant of this submatrix and 
the determinant of the submatrix formed by crossing out rows 4 , i 2 , . . . , i m 
and columns j i, j 2) . . . ,jm . The sign of the product is determined by 
the method used in the expansion by cofactors. The mX m submatrix is 
moved so that it occupies the first m rows and columns. Let us assume 
quite logically that i\ < i 2 < * • • < 4i, and ,71 < j 2 < • • • < im- Then 
after (i\ — 1 ) -f (i 2 — 2 ) + • • • + (i m ~ m) interchanges of rows and 
(j 1 — 1 ) + (j 2 — 2 ) + • • • + (im — wi) interchanges of columns, the 
m X m submatrix formed from rows ii, ... ,i m and columns ii, . . . ,j m 
lies in rows 1, . . . , m and columns 1 , . . . , m. Furthermore, the order of 
the remaining columns has not been changed. Once this rearrangement 
is completed, we are back at the case already considered. The sign de¬ 
pends on 

m 

y: (4+ jk ) — 2(1 + 2 + • • • + m ). 

k=l 


Flowever, 


1+2 + • • • + m = im(m + 1). 
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Note: To prove £=1 + 2 +-b m = + 1), simply write 

S = 1 + 2 H- m; 

S = m + m — 1 + ••'• + 1; 

addition yields 2 S = m(m + 1). 

Since m(m + 1) is always even, the sign to be attached to the product of 
the two determinants is 

( _i)2*Li <**+*> 

Two definitions will be helpful. 

Complementary minor: Given the nth-order matrix A. The determinant 
of the (n — m)th-order submatrix P formed by crossing out rows 
ii f ... f i m an d columns j \,. . . , j m is called the complementary minor of 
the mth-order submatrix N formed from rows i\,... ,i m ond columns 
ii) • • •) 

Complementary cofactor: With N, P as defined above, the determinant 

|M|=<— l) S '- I “* +i ‘ , |P| (3-99) 

is called the complementary cofactor of N in A. 

Example: Consider 

On a 12 a lS a 14 a 15 

O 21 ®22 #23 ^24 ^25 

a 31 O 32 &33 a 34 a S5 

U 41 «42 «43 a 44 a 45 

a 51 O 52 U 53 a 54 a 55 

The determinant of the submatrix N formed from columns 2 and 5 and 
rows 1 and 3 is 

|N| = | ai2 ° 1S • 

|<X32 0351 

The complementary minor is 

O 2 I CL 2 jj CL 24 
|P| = 041 O 43 O 44 

CLbl <*53 <*54 

Hence £(i* +i») = 2 + 5 + 1 + 3 = 11, and the complementary co- 
factor of N is 

|M| = -|P|. 
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From these results we can immediately derive a new way of expanding 
the determinant |A|. Select any m rows of A. From these m rows we can 
form. n\/m\{n — rn )! different mth-order submatrices, where n\/m\(n — m )! 
defines the number of combinations of n columns of A taken m at a time. 
In choosing these submatrices N, we always keep the order of the columns 
in A unchanged. For each submatrix N we find |N| and the corresponding 
complimentary cofactor |M|; then we form the product |N| |M|. From 
|N|, we obtain ml terms and from |M|, (n — m )! terms. Hence each product 
|N| |M| yields, with the correct sign, m!(n — m)l terms of the expansion 
of |A|. In all, there are nl/ml(n — m)\ products |N| |M| which yield a 
total of 

n\ 

_1 w )j m -( n — m)\ = n\ terms in the expansion of |A|. 

Thus, we have obtained all terms in |A| since our method of selecting the 
mth-order submatrices from the m rows eliminates any possible repetition 
of terms. Hence we can write 

1^1 ^ • • • , im \jly * • • > jm)\ |MJ, (3—100) 

h <h<- • • <3m 

where |N(zx, . . . , i m \j u . . . ,j m )\ is the mth-order determinant of the 
submatrix formed from rows i u . . ., i m and columns j u . . . ,j m . The 
sum is taken over the n\/m\(n — m)! choices for j 1} . . . ,j m . The nota¬ 
tion^! < j 2 < • ■ • < j m indicates that the sum is taken over all choices 
of the columns such that the column order is maintained. This technique 
of expansion is called the Laplace method: We select any m rows from A 
(note that they do not need to be adjacent). From these m rows we form 
the n\/m\{n — m)! possible mth-order determinants and find their in¬ 
dividual complimentary cofactors. We then multiply the determinant 
by its complimentary cofactor and add the n\/m\(n — m)! terms to 
obtain |A|. This is an example of expansion by rows i Xy . . . , i m . 

We can, of course, expand by any m columns and obtain 

l A l = I N ^1> •••>*» I ii, •• • »Jm)| |M|. (3-101) 

*1 < *2 < • • • < *m 

Example: Let us expand the determinant 


an 

<*12 

<*13 

<*14 

a 2X 

<*22 

<*23 

<*24 

<*31 

<*32 

<*33 

<*34 

<*41 

<*42 

<*43 

<*44 


by the first and last rows. There will be 41/212! = 6 terms. The deter- 
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minants of order 2 which can be formed from rows 1, 4 are 


Nil = 

an 

<*41 

a \2 

a 42 

- |n 2 | = 

an 

<*41 

<*13 

<*43 

- |N 3 | = 

a n 

<*41 

<*14 

<*44 

N 4 | = 

<* 12 
a 42 

<*13 
<*43 1 

> |n 5 | = 

<*12 

<*42 

<*14 

<*44 

> |N 6 | = 

1 <*13 

<*43 

<*14 1 

<*441 


The corresponding complementary cofactors are 


Mx| = 

<*23 

<*33 

<*24 

<*34 

> |m 2 | = - 

<*22 

<*32 

<*24 

<*34 

- |M*| = 

<*22 

<*32 

<*23 

<*33 

|m 4 | = 

<*21 

<*31 

<*24 

<*34 

> |M S | = - 

<*21 

<*31 

<*23 
<*33 1 

> |M e | = 

<*21 

<*31 

<*221 
<*321 


|A| = £ |N*| |M*|. 

Jfe=l 

The sign of |M 2 |, for example, is found as follows: 

L(4 + jk) (for N 2 ) = 4 + 1 + 1 + 3 = 9; 

hence the minor must be multiplied by (— 1 ), as shown. 


*3-16 Multiplication of determinants. There is a simple multiplication 
relation for the determinant of the product of square matrices. If A, B 
are matrices of order n, then, if C — AB, 

|C| = |A| |B|, (3-102) 

that is, the determinant of the product is the product of the determinants. 
To prove (3-102), consider the partitioned matrix of size 2 n X 2n, 



Applying Laplace’s expansion by the last n rows to the determinant |D|, 
we obtain 

|D| = A ° = (—l) 2 ™ 2 + ” <n+ 1 , |A| |B| = |A| |B|, (3-103) 

-I„ B 

since the complementary minor of any submatrix including one of the 
first n columns will have a column of zeros. The determinant is com¬ 
pletely independent of the matrix appearing in the lower left of (3-103). 
Matrix —I n was placed there for the following special reason: 
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Let us take the columns of B to be by. Then consider 



bi 


Ab x 

-bi 


(3-104) 


Equation (3-104) is a linear combination of the first n columns of D. We 
can add (3-104) to column n + 1 of D without changing the value of 
its determinant. This yields 

Abr 

0 


as the new column n + 1. Continuing this process and adding 


Aby 
by 

to the n + ^'th column of D, we get 



|D| 


A AB 
“In 0 


(3-105) 


(3-106) 


. We expand by the last n rows, using the Laplace expansion, and obtain 

|D| = (-i) re2 + n <”+ 1 >|-l n | |AB| = (—l) 2n(n+1, |AB| = |AB|. (3-107) 

Thus, from (3-103) we have proved that 


|AB| = |A| |B|. (3-108) 

Example: 


A = 

'2 3' 

> B = 

'l 6" 

i C = AB = 

'll 18 


.1 4 


3 2 


!3 14 


|A| = 5, |B| = -16, |C| = —SO; 

|C| = -80 = |A| |B| = 5(—16). 


*3-17 Determinant of the product of rectangular matrices. Let A be an 

m X n matrix and B an n X m matrix with m < n. If C = AB, |C| 
is an mth-order determinant. We shall now show how to express |C| in 
terms of the determinants of order m which can be formed from A and B. 
Consider the product 


fin. Al 

A 

o' 


0 

AB 

- 1 

£ 

HH 

o 

_1 

-In 

B 


-In 

B 


(3-109) 
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Equation (3-108) shows that 


I*. A 

A 0 


0 AB 

0 I„ 

-I» B 


-In B 


(3-110) 


Laplace's expansion indicates immediately that the first determinant has 
the value unity, and hence 


A 0| 
-In B 


(_l) n(m+1 > |AB|. 


(3-111) 


It is now necessary to evaluate the determinant of order n + m on 
the left. We shall use Laplace's expansion by the first m rows. First, 
note that nonvanishing determinants of order m can be formed only from 
the columns of A, since the use of any other columns would introduce a 
column of zeros and hence a vanishing determinant. Thus, there are no 
more than n\/m\(n — m)\ non vanishing terms in the expansion. The 
complementary minor to any determinant A of order m formed from A 
will have n — m columns from —I n . This complementary minor of 
order n will be of the form 

| e„ 2 ,.. ., ©u n _ m > B|, (3—112) 

where u i9 u 2 ,. . ., u n _ m refer to the columns of A not in A. We can im¬ 
mediately expand by cofactors, proceeding from the first column to the 
second, etc., ... to the (n — m) th column. Note that, aside from sign, 
this expansion crosses out rows U \ 9 u 2 , . . ., u n — m of B so that, in the end, 
we obtain a determinant of order m formed from B which contains the same 
rows as the corresponding columns chosen from A to he in A. 

Next, we shall discuss the sign problem. First, the sign of the comple¬ 
mentary minor is 

^_j^(l/2)*»(m+l)+ii+i2*4— 


where j\ 9 . .., j m are the indices of the columns chosen from A to be in A. 
Then, in the expansion of the complementary minor, the minus signs in 
the ~e Ui yield (—l) n ” w . Finally there are the signs coming from the 
expansion by cofactors. These contribute 


£_1^ l+ w l(_1+1*2— 1 






■qti— m+uH-bWn-m—(l/ 2 )(n—m—l)(n—m) 


(3-113) 


since we expand each time by the first column, and the row index is cut 
down as a result of preceding expansions. Thus the sign is 


(-D 


(l/2)m<m+l)—(l/2)(n—ro)(n—m—- h“n-m+20»—"•)_ 


(3-114) 
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Note that since the indices of all columns of A not inji, ... >j m appear in 
Ui,, u n - m , it must be true that 


j 1 + * * * + jm + + • • • + Un—m — 1+2 + ■••+» = ^ft(n + 1). 

(3-115) 

Cancellation shows that the sign depends only on (—i) w(m + 1) and not on 
the m columns chosen to be in A. Hence (3-111) demonstrates that in 
the expansion of |AB| a plus sign should be associated with each of the 
nl/(n — m )! products of two mth-order determinants obtained in the 
expansion of the left-hand side of (3-111). 


We have proved that if A is m X n and B is nX m (m < n), then |AB| 
can be represented as the sum of n\/m\(n — m)\ terms. Each term is the 
product of two mth-order determinants formed from A and from B, respectively. 
A given mth-order determinant formed from the columns ji, ... ,j m of A 
multiplies the determinant formed from rows j i, ... f j m of B. 


Example: 


A = 





'3 

O' 

-1 

i 

iO 

, B = 

9 

2 

[2 0 

3 


_1 



J 


7 _ 


AB = 




, 

~3 

o' 




"l 

4 

5 


9 

2 

— 

"44 

43" 

2 

0 

3 





9 

21 




l 

7 





|AB| = 537. 


However, |AB| can be expressed as the sum of three terms, each of which 
is the product of a 2 X 2 determinant from A and a 2 X 2 determinant 
from B. The three pairs of determinants which can be formed from A 
and B, respectively, are: From columns 1, 2 of A and rows 1, 2 of B, 


|Ai| = 


1 4 

2 0 


= - 8 , 


|B 11 


3 

9 



from columns 1, 3 of A and rows 1, 3 of B, 


|A 2 | = 


1 5 

2 3 


-7, 



0 

7 


= 21; 


from columns 2, 3 of A and rows 2, 3 of B, 


|a 3 


0 3 


= 12 , 


|b 3 | 


9 2 
1 7 


= 61. 
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Hence, 

|AB| = |Ai| |B t | + |A 2 | |B 2 | + |A 3 | |B 8 | 

= -8(6) - 7(21) + 12(61) = -48 - 147 + 732 = 537; 
the results are indeed identical. 

3-18 The matrix inverse. Given any real number a 0, there is a 
number a~ l (the inverse or reciprocal of a) such that a” 1 a = aa~ l — 1. 
We shall now investigate whether matrices possess this inverse property 
and, if so, under what circumstances. We ask: For any given matrix A 
does there exist a matrix B such that 

AB = BA = I? (3-116) 

If such a matrix B does exist, it is called the inverse of A. This inverse 
is usually written A -1 (it should be noted that A -1 does not mean l/A 
or I/A; we have no rules for dividing matrices). A 1 is merely the symbol 
given to matrix B in (3-116). 

First of all, we note that A can have an inverse only if A is a square 
matrix, since the products AB and BA cannot both represent the same 
identity matrix unless A is square. Hence the inverse will also be square. 

Matrix inverse: Given a square matrix A. If there exists a square 
matrix A -1 which satisfies the relation 

A -1 A = AA- 1 = I, (3-117) 

then A"" 1 is called the inverse or reciprocal of A. 

Next we wish to compute the inverse when it exists. As a matter of fact, 
we have already done this at the end of Section 3-14, where we showed 
that 

AA+ = A+A = |A|I. (3-118) 

If we define 

A- 1 = ± A+ (3-119) 

then this matrix satisfies (3-117) and hence is an inverse of A. The in¬ 
verse can be formed in this way only if |A| ^ 0. 

Singular and nonsingular matrices: The square matrix A is said 
to be singular if |A| = 0, nonsingular if |A| ^ 0. 

We have shown that every nonsingular matrix has an inverse. It is 
also true that only nonsingular matrices have inverses. (In other words, 
if A is singular, there is no matrix B such that AB = BA — I.) We shall 
prove this fact in Chapter 4. 
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If A has an inverse, then this inverse is unique. To see this, suppose 
that B is a matrix such that AB = BA = I; and that, in addition, there 
is another matrix D such that AD = DA = I. Multiplying AB = I 
on the left by D, we obtain 

DAB = DI = D. (3-120) 


However, by assumption, DA = I, and expression (3-120) reduces to 
B = D; hence the inverse is unique, and it is permissible to speak of A -1 
as the inverse of A [see (3-119)]. Furthermore, if A is nonsingular and a 
matrix B is such that AB = I, then B = A” 1 , and consequently, BA = I 
also.* To prove this, we only have to multiply AB = I on the left by 
A 1 and recall that A“ X A — I. In mathematical terminology, the fact 
that BA = I if AB = I, and vice versa, indicates that a right inverse 
for A is also a left inverse, and vice versa. 


Examples: (1) Let us compute the general formula for the inverse of 
a 2 X 2 nonsingular matrix A: 


A = 


an 

a 2 i 


a l2 

a 2 2 


The matrix of the cofactors (in this case, the cofactors are determinants 
of first order) is 

a 2 2 —021 
“012 a n 


Thus the adjoint (transpose of the above matrix) is 


and 


A + = 


022 
— 021 


“H, 

anj 


A -1 = — 0/22 —° 12 

|A| _—0 2 i 0n 


To prove that this is the inverse A we only need to show that A“*A = I: 


A -1 A = 


i r 022 

|A| [ 021 


'012 [011 012 
011 [ a 21 a 22 


_ 1 [022011 —01202l] 0 

^ ^ _ 0 [022011 —01202l] 


' =P °l=i. 

LP l. 


* In fact, if AB = I for two nth-order matrices A and B, then both are non¬ 
singular and B = A- 1 . It is not necessary to assume that A is nonsingular since 
this follows from AB = I (see proof in Chapter 4). 
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(2) Assume 



II 

+ 

< 

CO 

II 

<[ 

3 % 

and 

A- 1 = 

1 

0 ■ 


1-2 lj 



-2/3 

1/3. 

This is easily proved by computing 





A~*A = 

' "I 

p °i 

= P °1 

= I. 



-2/3 1/3J 

h 3J 

Lo ij 




(3) If A = (an), then |A| = an, A + — (1), and 
a_1 - (£) ■ 011 96 °- 


When a matrix is a single element, the inverse is also a single element, i.e., 
the reciprocal of that in A. 

Note: It is always possible to determine whether a given matrix is the 
inverse of some matrix; we simply multiply the two matrices and see 
whether the identity matrix is obtained. (See footnote page 104.) 


3-19 Properties of the inverse. We shall discuss now the inverse of 
the product of two nonsingular matrices A, B. We shall show that the 
product of two nth-order nonsingular matrices is nonsingular, and 

(AB) -1 = B -1 A -1 . (3-121) 

We prove this by noting that 


B -1 A -1 AB = B _1 IB = B -i B = I, 

and 

ABB -1 A -1 - AIA- 1 = AA" 1 = I. 

Thus B^A -1 satisfies (3-117); hence it is the unique inverse. In the same 
way, it can be shown that the product of any finite number of nonsingular 
matrices is nonsingular and that the inverse is the product of the in¬ 
verses in reverse order. In Problem 3-34 the reader is required to prove 
this statement. 
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Example: 


A = 

’> °1, B- 

2 5 , C = AB = 

’2 5 " 


L2 3j 


.2 1. 

.10 13. 

A" 1 = 

1 0 ‘ 

f 

td 

1 

H-* 

II 

1 

(.a 

00 

01 

\ 

00 

f 


.-2/3 1/3. 


L 1/4 - 1 / 4 J 



-i _ 


-13/24 
10/24 


C“‘ = (AB)- 1 = B _I A 




-1/8 5/8 

L 1/4 —1/4J 

—13/24 5/24 

10/24 -2/24J 


5/241 . 

—2/24J ’ 

‘ °1 

-2/3 1/3J 


Observe the similarity between the formulas for the inverse and the trans¬ 
pose of a product. 

If A is nonsingular, then 


(A l ) 1 = A. (3-122) 

This follows immediately from AA _1 = A -1 A = I if we consider A -1 
to be the given matrix rather than A. Hence, A is the unique inverse of 
A -1 , and (3-122) holds. 

The identity matrix is its own inverse, since I„I„ = L, implies that 

T—1 T 

± n l n . 

Next, we shall demonstrate that the inverse of the transpose is the 
transpose of the inverse, that is, 


(A')- 1 = (A- 1 )'. 

We start with 

AA _1 = A —1 A = I. 


Taking the transpose and noting I' = I, we obtain 


(A -1 )'A' = I = A'(A-y. 

Thus (A -1 )' is the inverse of A', and (3-123) follows. 


(3-123) 


(3-124) 
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Example: 



Hence, as expected, (3-123) holds in this case. 

We have previously seen that 

AB = 0 (3-125) 

did not necessarily imply that A = 0 or B = 0. If, however, either A 
or B is nonsingular, then the other is a null matrix. Suppose A is non- 
singular, and (3-125) holds. We premultiply by A"" 1 and obtain 

A _1 AB = 0 = IB = B; B = 0. 

Thus B is a null matrix. The same proof applies to a nonsingular matrix B. 
Hence the product of two nonsingular matrices cannot be a null matrix. 
If we write a nonsingular matrix B as a row of column vectors, 



B = (b lf ..., 

b»), 

(3-126) 

and B 1 as a column 

of row vectors, 




B- 1 = [0 1 ,.. . 

,n 

(3-127) 

then 

B -1 B = \\p%W 

= i. 

(3-128) 

Consequently, 

by — 8 ij , all i f j. 

(3-129) 


Equation (3-129) shows that row i of B 1 is orthogonal to every column 
of B except column i. 


3-20 Computation of the inverse by partitioning. In this section and 
in Section 3-21 we shall consider special ways of computing the inverse 
of a nonsingular matrix. These methods are of theoretical interest; 
furthermore, variations have been used as numerical procedures for in¬ 
verting matrices on digital computers. 
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Suppose we have annXn nonsingular matrix M. Let M be partitioned 
as follows:* 

r« 


M 


\y s 


(3-130) 


where a is an s X s submatrix, |Jan*Xm submatrix, 7 an m X s sub¬ 
matrix, and 6 an to X to submatrix (n — to + s). M -1 exists and will 
be partitioned in the same way as M, that is, 


M -1 = 


A B 

LC D 


(3-131) 


where A is s X s, B is $ X w, C is w X s, and D is to X to. 

Assume that 6 has an inverse and that fi _1 is known. Then, since 
MM -1 = I, 

r _~i r. r* * 1 

(3-132) 


8 

i_ 

"A 

w 

_ 1 


X 

O 

-1 

C 

1 

Q 


0 

Xj 


Four equations are obtained for the four unknown submatrices A, B, D, C: 


From (3-135), 

Substituting this into (3-133), we obtain 

«A - P S _1 7A = I„ 
or by definition of the inverse, 

A = (a - p s-'y)- 1 . 
Using (3-136), we arrive at 

D = 6 _1 — « -1 7B. 


«A 

+ 

pc 

= X 

(3-133) 

«B 

+ 


= o, 

(3-134) 

yA 

+ 

6C 

= o, 

(3-135) 

yB 

+ 

6D 

= x. 

(3-136) 

C = 

_ 

— 6 

_1 7A. 

(3-137) 


(3-138) 

(3-139) 


* In this section we have dropped our convention of using upper-case roman 
letters for submatrices. 
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From (3-134) and (3-139), 

«B + p - p 6 -1 TB = 0, 
(a - /3« -1 7)B = -06 -1 . 


Using (3-138), we get 


B - -ApS~\ 


(3-140) 


We have obtained four formulas which can be solved sequentially for 
A, B, C, D. They are 


A — (a — P S~ l y)~\ 

(3-141) 

B = —A p6~\ 

(3-142) 

C = -6~ l yA, 

(3-143) 

D = 6 -1 — 6 -1 YB. 

(3-144) 


Since M -1 exists, the submatrices A, B, C, D exist. Hence if 6 1 exists, 
all the operations can be carried out, and A, B, C, D can be computed. 

Imagine that we wish to invert a matrix by means of the preceding 
formulas. If we partition the matrix so that S is of a size which can be 
inverted easily, then we shall find it difficult to obtain A according to 
(3-141) if the order of M is fairly large. We are able to avoid this difficulty 
by partitioning in two or more steps. If 8 is 1 X 1 or 2 X 2, 6 1 is easily 
computed. If A is 1 X 1, then it is given by 

A = (1/a — p 6~ 1 y). (See Section 3-18, Example 3.) 


It is also easy to find A if it is a 2 X 2 matrix. For example, if we wish to 
obtain the inverse of 


M = 


" mu 

mi2 

mis 

m i4 

m 15" 


w 2 i 

ra 2 2 

m 2 3 

m 24 

m 25 


m 3 1 

m z 2 

m 33 

m 34 

m Z 5 

m 4 i 

m 42 

m 4 3 

m 44 

m 45 

b «J 

_w 5 i 

m 52 

W53 

m 54 

W55. 



(3-145) 


we might start with the above partition. To find 6 1 , we might parti¬ 
tion 6 as 

I WI33 | m 3 4 WI35 1 | , a , | 

(3-146) 



’^33_ 

m ZA 

msf 


a! 

P' 

6 = 

m AZ 

m 44 

m 45 

= 


m 5Z 

m 54 

w 55 


y' 

s' 


Now (6') —1 is easily found, and hence 6 1 can be obtained. Using 6 \ 
we compute M _1 . We have simply applied the partitioning procedure 
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in two steps. To invert a large matrix, the same procedure can be re¬ 
peated a number of times. 

Example: Let us compute the inverse of 



2 ! 1 0" 





M = 

3 0 5 

— 

CL & 

> M _1 = 

-1 

> 

W 

_i 


1 

CO 

I> ( 


y s 


Lc Dj 


We begin by partitioning the matrix as shown. Then 


From (3-141): 


8 = 

’o 5 ’ 

; « _1 = 

"-2/15 

1/6 


.6 4. 


- 1/5 

0 . 


A = 

2 - (1,0) 

-2/15 1/6 

3 


- 

.1/5 0 . 

.7J. 

Using (3-142), we obtain 



B = — (30/37) (1,0) 

’-2/15 1/6' 

— 



.1/5 0 . 


From (3-143): 





i-i 


= 30/37. 


= (4/37, -5/37). 


C = - 

’-2/15 

1/6* 

V 

(30/37) = 

’-23/37" 


- 1/5 

0 . 

.7. 


.-18/37. 


Using (3-144), we arrive at 


D = 

—2/15 

1/6" 

_ 

’-2/15 

1/6 

Y 


- 1/5 

0 . 


- 1/5 

0 . 

. 7 . 


(4/37, -5/37) 


-8/37 10/37 
I. 5/37 3/37. 

Combining all the results, we see that 

30 4 

M -1 = 1/37 -23 -8 
.-18 5 


-5 

10 

3. 


This result can be checked by showing that M = I. 
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As a special case of the general method for inverting a matrix consider 


M = 


I Q 
.0 RJ 


(3-147) 


where R -1 exists and is known, and I is, of course, the identity submatrix. 
Using the equations previously determined, we find 


A = (I - QR -1 0) -1 = I, 
B = —IQR -1 = —QR -1 , 


C = —R —1 0l = 0, 

D = R -1 - R -1 0(—QR -1 ) = R -1 . 


Therefore, 

M -1 = p (3-148) 


3-21 Product form of the inverse. Let us suppose that we have a 
matrix B and know B — *. A single column in B is changed, and we de¬ 
sire to find the inverse of the new matrix. Let us write B as a row of column 

vppfArs 

B = (bj, b 2 ,..., b r ,..., b„). (3-149) 


The column b r is removed and replaced by column vector a. We wish 
to compute the inverse of the new matrix B 0 , with a replacing b r in column r: 


B<* — (bi, b 2 , • • • f br—l, a> b r+ i, . * • y bn)* 


(3-150) 


We shall approach the problem by trying to write a as a linear combina¬ 
tion of the columns of B, that is, 


a — y ] y *‘b i — By, 

i=i 


(3-151) 


y = [y i, *. •, Vnl 

Since B has an inverse, we can premultiply (3-151) by B' 1 and obtain 


y = B _1 a. 


(3-152) 


Thus, we can indeed express a as a linear combination of the columns of B. 
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If y r 0, then 


br= _^l bl _^ b2 _ 


Vr 


+ -a 

Vr 


1 - Vr +1 


Vr 


Vr 

b r+1 — 


Vr-l 

Vr 


hr—1 


yn K 

-b n . 

Vr 


Let us define the column vector 


(3-153) 


r^, Vr=l , J_ , _ Vr+l y , , , y _ Vn\ t 

L Vr ’ 2/r ’ ’ Vr ’ Vr ' Vr Vr] 

With this definition, (3-153) can be written as the matrix product 


(3-154) 


b r = B a iy. (3-155) 

On the right, we have B 0 and, on the left, one column of matrix B. The 
question is: How can we replace b r on the left by B? Obviously, we 
must substitute a matrix for the vector r\ ; this matrix must have i| as its 
rth column, and its remaining columns must be such that, when multiplied 
by B a , they yield the proper columns of B. Let the other columns in the 
matrix replacing if be cl u cl 2 , . . . , oc n . Then 


B — (bi, b 2 , . .., b r , . 


Consequently, 


. . , b n ) = B a (a 1} ot 2f . . . , i?, . . . , ot n ) 

' (®a^l> • • • f • • • j BaOCji). 

(3-156) 


bi B a oci, b 2 — B a o£ 2 , • • • j b r — B a i|, . . . , b n — B a ce n . (3—157) 

The first column of B a is bj. Hence if oti = ei, then B a ei = bi. 
Similarly, the nth column of B a is b n , and ct n = e n . Thus 

oti = ©i, oc 2 = e 2 ,.. ., oi r —i = e r _ 2 , ofr+i = e r+1 ,. .. , ct n = e„. 

Define the matrix 


E (6j, ®2j • • • > ®r—1> n? • • • > ®n)» (3—158) 


Equation (3-156) becomes 
or 

Consequently, 


B = B a E, 

I = B a EB _1 . 
B - 1 = EB” 1 . 


(3-159) 

(3-160) 


We see that B a 1 will exist if EB 1 or E does. Matrix E will certainly 
exist provided y r ^ 0. 
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Consider what happens when y r = 0. It means that the vector a is a 
linear combination of the vectors bi, b 2 , • • . , b r _i, b r +i, . . . , b n . Hence 
in B a the rth column can be represented as a linear combination of the 
other columns in B a . From our study of determinants we know that, 
in this case, |B a | = 0 , and therefore B a has no inverse. Thus when y r = 0, 
the new matrix has no inverse. 

Let us review the procedure for computing the inverse of a matrix 
which differs only in a single column from a matrix whose inverse we know. 
A matrix B 0 is formed by replacing column r, b r , of B with a. We 
know B~ x and wish to determine B^ 1 . First, we compute 

y = B — *a. (3-161) 


Then we form the column vector 

T Vi 2/2 .. , _ 2e=I, 1, - 8z±!,.. - , - 

n L Vt yr Vr Vr Vr 

(3-162) 

We replace the rth column of the identity matrix I n with t\ and obtain E. 
Then 

B^ 1 = EB- 1 . (3-163) 



The procedure outlined above resembles to a considerable extent the 
technique used for inserting a vector into a basis (Section 2—9). It is 
exactly that. We can easily see that, when B has an inverse, the columns 
of B form a basis for E n . We only need to show that the columns are 
linearly independent. Consider the problem of finding A* satisfying 


EX;b t = 0. 


In matrix form, this is 


BA = 0; 


however, since B 1 exists, 

B -1 B\ = B —1 0 = 0 = X. (3-164) 

Thus all Ai are zero, and the b< are linearly independent. A new vector a 
now replaces b r . If y r ^ 0, then the new set of vectors also forms a basis 
since B^ -1 exists. The interesting connection between inverses, bases, etc., 
will be explored more thoroughly in the next chapter. 

The method discussed above can be used to invert any nonsingular 
matrix B = (b x , . . . , b„). We start with the identity matrix I»(C" :1 = In) 
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and, for example, replace the first column of I n by column 1 of B. The 
inverse of the new matrix with bi in the first column is 


B i 1 = Ejl = Ei. (3-165) 

Into column 2 of B t we insert column 2, b 2 , of B and obtain B 2 . The 
new inverse is 




BF 1 = 

— E 2 B 1 — E 2 E 1 . 

(3-166) 

We continue to 

insert one column at a time until we get 




B- 1 = 

= E n E n _i • • • E 2 E 1 

(3-167) 

and 






E, = (ei 

) ■ • • j Vi) ‘ 

(3-168) 

where rj, 

: is in the ith. column. 

Furthermore, 


Vi = 

r vu 

V2i 

t — - ) • • * y 

Vi-U 1 Vi+U 

- 


L Vu 

yu 

y a yu yu 

Vu\ 





(3-169) 

7i = 

Briibi 

= Ei-iE*_ 2 • 

■ ■ Eilb,-, i > 2; y! = bi. 

(3-170) 


When B -1 is expressed as the product of E matrices as in (3-167), it is 
called a product form of the inverse. Form (3-167) is not unique. Dif¬ 
ferent orders of insertion of the b, may be used and may be required. 

Example: Compute the inverse of 

3 4 3 9 

8 4 8 3 

9 5 19 
5 9 5 5 

and write a product form of the inverse. We begin with the identity 
matrix and insert the first column of B into column 1 of I. Then 



y x = Ibj = b t = [3, 8, 9, 5], m = [1/3, -8/3, -9/3, -5/3], 



1/3 0 0 0 
-8/3 1 0 0 
-9/3 0 1 0 
-5/3 0 0 1. 
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Inserting the second column of B into column 2 of Bi, we obtain 


y 2 = E x b 2 = [4/3, —20/3, -21/3, 7/3], 
m = [1/5, -3/20, —21/20, 7/20], 


1 

1/5 0 O' 


' - 1/5 

1/5 0 O’ 

0 

-3/20 0 0 

; B 2 1 = E 2 Ei = 

2/5 

-3/20 0 0 

0 

-21/20 1 0 


- 1/5 

-21/20 1 0 

.0 

7/20 0 1. 


.-13/5 

7/20 0 1. 


Inserting the third column of B into column 3 of B 2 , we get 


y 3 = B^bs = [1, 0 , -8, 0], m = [1/8, o, - 1 / 8 , 0], 



"1 0 

1/8 o' 


’ -9/40 11/160 1/8 0 ' 

E3 = 

0 1 0 0 

0 0 -1/8 0 

; BF 1 = E 3 Bi- 1 = 

2/5 -3/20 0 0 

1/40 21/160 -1/8 0 


L0 0 

0 1. 


.-13/5 7/20 0 1. 


Finally, we insert the last column of B into column 4 of B 3 and arrive at 


y 4 = Br 1 b 4 = [-111/160,63/20, -81/160, -347/20], 
n = (l/347)[—111/8, 63, -81/8, -20], 

347 0 0 -111/81 

0 347 0 63 

0 0 347 -81/81 

0 0 0 -20 


E 4 = (1/347) 


thus 


B" 1 = Br = E 4 Br = (1/347) 


-42 19 347/8 -111/8 

-25 -30 0 63 

35 42 -347/8 -81/8 

52 —7 0 —20 


A product form of the inverse is 

B 1 - E 4 E 3 E 2 Ei. 
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The last few sections have amply demonstrated that the computation 
of the inverse of a matrix, especially when n is large, is indeed an arduous 
task-even for large high-speed digital computers. Computing A from 
A+/|A| is usually a rather inefficient procedure since it requires computa- 
tion of a large number of determinants. 

*3-22 Matrix series and the Leontief inverse. Polynomials of real 
numbers can be generalized to infinite power series, that is, to polynomials 
containing an infinite number of terms. Such a power series can be written 

E x»* B . < 3 - 171 > 

n=0 

For a fixed x the series is said to converge to a limit S if 

lim S N = S, (3-172) 

N-* oo 

where 

E X n s". (3-173) 

n=0 

Thus the series converges to a limit S if for any € > 0, however small, 
there is an Nq such that 

1 8 - S N \ < e, all N > N 0 . (3-174) 

When the series (3-171) converges to a limit, then we can write 

S = £ X n x", (3-175) 

«—0 

and S is called the sum of the series. The limit S will, of course, depend 
on the value of *. Clearly, the series cannot converge unless 

lim \ n x n = 0, (3-176) 

n—>oo 

since (3-174) cannot hold in any other case. However, the validity of 
(3-176) does not necessarily imply a convergence of the series. A series 
which does not converge is said to diverge. 

Example: Consider the geometric series H=o x n , |x| < 1 : 

Sn = 1 + x + • • • + x N , 

xSn — x + x 2 + • • • + x N+ ■ 
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Thus 

i _ + 1 1 i i -i 

S N - A-j—^-> lim S N = TZTZ ’ M < 1 

Hence, when \x\ < 1, the geometric series converges to the sum S = 
1/(1 — x ). If \x\ > 1, then lim^-* ^ 0, that is, the series diverges. 

It is possible to develop a theory of matrix power series in analogy with 
series of real numbers. We can work with powers of matrices only when 
the matrices are square. Thus, if A is any square matrix, we can consider 
the series 

£ AnA”, (3-177) 

n=0 

where the X n are scalars. It is convenient to use the definition 

A 0 = I. (3-178) 

Any discussion of the convergence of matrix series must be preceded 
by a definition of the limit of a matrix sequence. This limit is defined 
in terms of the limits of the sequences of the matrix elements. Thus the 
sequence A n = ||(o,y)„|| will be said to approach a limit A = Ik;II if 
and only if 

lim (flu) n = «»;, (each i, j). (3-179) 

n —>qo 

Note-that the limit A exists if and only if each matrix element approaches 
a unique limit. Thus lim«_« A„ = A means that the limit of each of the 
n 2 sequences of real numbers defined by (3-179) must exist and be finite. 

Having developed the notion of the limit of a matrix sequence, we can 
now state that series (3-177) converges and sums to the matrix S if 

lim Sat = S, (3-180) 

N —>oo 

where 

S* = £ X„A" . (3-181) 

«=0 

If the series converges, we can write 

S = £ X n A n . (3-182) 

n=0 

Hence, the series converges to S if 

lim (V X„A n - s\ = 0. (3-183) 

*■— \»=o / 
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Again, the series cannot converge unless 

lim A n A w = 0, (3-184) 

n—« 

since otherwise (3-183) cannot hold. However, the validity of (3-184) 
does not imply the convergence of the series (3-177). 

We shall not pursue the subject of matrix series any further. Instead 
we shall terminate our discussion by showing how the inverse of I — A 
can be written as a power series when A has certain properties (see below). 
In economics, a square matrix of the form I — A appears in connection 
with the Leontief input-output model discussed in Chapter 1. In a Leon- 
tief system, the matrix A can be defined so that it has the following 
properties*: 

0 < aij < 1 (all i, j) ; (3-185) 

2«« < 1 (aUi)- (3-186) 

1 

Our example of a geometric series showed that (1 — x)Y,7= 0 %* = 1> 
\x\ < 1. This suggests that we can write 

(I - A) -1 = £ A* = I + A + A 2 H-. (3-187) 

k*=0 

This is not true for any arbitrary A, but it holds when A satisfies (3-185) 
and (3-186). Note that 

B* = (I — A) (I + A + A 2 + • ■ • + A*) = I - A* +1 . 

(3-188) 

Consider the matrix sequence B*. Then B* tends to the limit I as k —» oo if 


lim (B fc - I) = 0. (3-189) 

k —»oo 

However, from (3-188): 

lim (B* - I) = lim (-A* +1 ). (3-190) 

h ~* qo 

Thus if 

lim (—A* +1 ) = 0, (3-191) 

k —>oo 


* This definition requires that the technological coefficients be measured in 
monetary rather than physical units. 
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then (I - A) £ A* = I, (3-192) 

k=0 

and, by definition of the inverse, (3-187) is correct. 

It remains to be shown that (3-191) holds when the elements of the 
matrix A satisfy (3-185) and (3-186). Since (3-186) is true, we can write 

0 < £■«* < r < 1 (alii). (3-193) 

Consider any element of A 2 = ||aj 2) ||, that is, 


and 

£ & 

i—l 

Hence, 


G*'/* - y y Q'ikQ'kjf 

k—l 

53 y dtkdkj = 53 ^53 a **) aki — r 53 ah * — f2 * 

i«i fc=i *=1 \i«i / 

(3-194) 

53 a[f < r 2 (alii), 

»=i 


which implies, since all the elements satisfy (3-185), that 


<#> < r 2 (all i,j). (3-195) 

Continuing in the same manner, we find that that is, any element of 
A* satisfies 



Consequently, as k —► oo, each a$ } —> 0, and (3-191) is valid. We have 
proved that if (3-185) and (3-193) hold, (3-187) is true also. 

Writing (I — A)” 1 in the form of a power series (3-187) is of advantage, 
particularly when I — A is of a high order; in such cases, the standard 
numerical methods of inversion can introduce, through rounding off, 
rather large errors, implying that (I — A) -1 cannot be determined ac¬ 
curately. This problem of rounding off is considerably reduced if 
(I ~ A)" 1 is evaluated by selecting a finite number of terms in the power 
series expansion. The inverse can be computed to any desired degree of 
accuracy by taking a sufficient number of terms in the expansion. Un¬ 
fortunately, the series does not always converge rapidly, and a rather 
large number of terms must be used to obtain a satisfactory degree of 
accuracy. Berger and Saibel [3]* discuss other power series expansions 
of (I — A)” 1 which converge more rapidly. 


* Numbers in brackets denote bibliographical references. 


120 


MATRICES AND DETERMINANTS 


[CHAP. 3 


References 

1. A. C. Aitken, Determinants and Matrices. Edinburgh: Oliver and Boyd, 
1948. 

This little book gives an excellent, concise discussion of matrices and de¬ 
terminants. It is quite readable and approaches the subject from an ele¬ 
mentary point of view. 

2. R. G. D. Allen, Mathematical Economics. London: Macmillan, 1956. 

3. W. J. Berger and E. Saibel, “Power Series Inversions of the Leontief 
Matrix.” Econometrica , 25: 1, 1957, pp. 154-165. 

4. G. Birkhoff and S. MacLane, A Survey of Modern Algebra. New York: 
Macmillan, 1941. 

5. R. Courant and D. Hilbert, Methods of Mathematical Physics , Volume I. 
New York: Interscience, 1953. 

Chapter 1 of this famous work gives a brief but powerful discussion of linear 
algebra. Power series inversions of matrices of the form I — XA are also 
considered. 

6. W. L. Ferrar, Algebra. Oxford: Oxford University Press, 1941. 

This book gives an elementary, clear discussion of matrices, determinants, 
and quadratic forms. 

7. E. A. Guillemin, The Mathematics of Circuit Analysis. New York: 
Wiley, 1949. 

Guillemin discusses matrix theory and its application to circuit analysis. A 
good intuitive feeling for the subject is conveyed to the reader. 

8. P. R. Halmos, Finite-Dimensional Vector Spaces. Princeton: Van Nostrand, 
1958. 

9. F. B. Hildebrand, Methods of Applied Mathematics. Englewood Cliffs: 
Prentice-Hall, 1952. 

Chapter 1 gives an excellent treatment of matrices and determinants. It is 
easy to understand and covers a number of useful topics often absent from 
pure mathematics texts. 

10. P. LeCorbeiller, Matrix Analysis of Electrical Networks. New York: 
Wiley, 1950. 

11. O. Morgenstern, (ed.), Economic Activity Analysis. New York: Wiley, 
1954. 

Part II presents a rather complete study of the properties of Leontief matrices. 

12. S. Perlis, Theory of Matrices. Reading, Mass.: Addison-Wesley, 1952. 

13. R. M. Thrall and L. Tornheim, Vector Spaces and Matrices. New York: 
Wiley, 1957. 



Problems 


3-1. When addition is defined, add A and B in the following cases: 


(a) A = 

(b) A = [3, 2], 


CO 

1- 

4 

5 

w 

II 

--1 

CD 

7 

i- 

L2 

1 

6. 


L« 

1 

-1 

00 


B = 


(c)A=[ an H» B=[ 611 

[a,21 0'22_ _&21 


&12 

b22 


(d) A = 


B - 


(e) A = 


- f° 1 1, 
0 


B = 


7 4 
1 0 


(f) A = 


0 

1- 

o 


o 

0 

0 

1- 

o 

1 

0 

, B = 

0 

0 

0 

0 

0 

1. 


0 

0 

0 

0_ 


3-2. Let A = ||a»y|| be an m X n matrix and B = ||M an mXn matrix. 
Show by actually writing out the sums that A + B = B + A. 

3-3. Let A, B, C be m X n matrices. Show by actually writing out the sums 
that A + (B + C) - (A + B) + C. 

3-4. Consider the addition of sets of linear equations of the form Ax = b, 
Bx = d and show that our definitions of matrix addition and multiplication 
by a scalar are consistent with the corresponding operations on sets of simul¬ 
taneous linear equations. In addition, show how the restrictions on the number 
of rows and columns are a logical development. Note carefully what addition 
means in this context. If x satisfies both Ax — b and Bx — d, then x is a solu¬ 
tion to ( A + B)x = c + d. 

3-5. Find the products AB and BA (when they are defined): 
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(b) A = 


1 2 
3 1 


4 5 
0 2 


B 



"l 3 4 


i- 

o 

<N 

O 

i-H 

1_ 

(c) A = 

2 0 7 

, B = 

7 1 3 


5 6 9 


CD 

iO 

_1 


(d) A = 


(e) A = 


(0 A = 


1 
0 

7 

8 
1 
0 

7 

8 

3 1 
2 4 
5 6 


B = (2, 4, 9, 6, 5,0); 


B = (2/4, 9, 6); 


B-f 1 ' 

2 


3-6. Show by actual computation that (AB)C = A(BC) when 







3 

7 

r 

II 

W \ 

t-H 

'o 

3 

a 

c = 

2 

6 

i 

L3 4j 

_1 

8 

9j 


_1 

4 

0_ 


3-7. Given the diagonal matrices A = 11 a.- B = [|6< 5,/||, compute AB 
and BA. What is true of AB and BA? 

3-8. Prove that, when the operations are defined, it is always true that 
(A + B)C - AC + BC. 

3-9. Given matrices A, B, C, D, and assuming that all operations are defined, 
prove from the definition of multiplication that 


(A + B)(C + D) = A(C + D) + B(C + D) = AC + AD + BC + BD. 


Under what conditions are all the operations defined? 
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3-10. Given matrices A, B; under what conditions do the following equations 
hold? 

(a) (A + B) 2 = A 2 + 2AB + B 2 ; 

(b) (A + B)(A - B) = (A - B)(A + B) - A 2 — B 2 . 


3-11. Given an nth-order diagonal matrix D = ||d» S%j\\ and an n X n matrix 
A. If AD = DA, what type of matrix is A? Consider a case where all d, are 
different, and a case where some d t - may be the same. When two or more d, are 
equal, assume that they fall in consecutive rows. Sketch the structure of A 
when some d* are equal. 

3-12. Given two symmetric matrices A, B of order n. When is the product 
AB also symmetric? 

3-13. For any matrix A show that A'A is defined and is a symmetric matrix. 

3-14. Given a symmetric matrix A and a skew-symmetric matrix B, both of 
order n. Show that AB is skew-symmetric if A and B commute. 

3-15. Given two skew-symmetric matrices A and B of order n. Show that 
AB is skew-symmetric if and only if AB = —BA. Matrices for which AB = 
—BA are said to anticommute. When is the product of two skew-symmetric 
matrices symmetric? 

3-16. Given the following matrices: 


- 1 

to 

l" 

W 

II 

"l 

7 

1 - 

<N 

L3 

4 


_0 

6 

5j 


Show by direct computation that AB = (Abi, Ab2, Ab3). The b» are, of course, 
the column vectors of B. 

3-17. Matrices A and B are partitioned as shown: 


3 

2 

1 

4" 


4 

6 

5 

0 

, B = 

7 

1 

! 0 

2 



1 7 
0 6 
1 2 
5 1 


Prove by direct multiplication and by block multiplication that the same result 
AB is obtained either way. 

3-18. Write out the expansion of the general fourth-order determinant (use 
the basic definition). Combine terms so that an expansion by the second row 
is obtained. Show that the cofactors A 23 are simply (—1) J+2 times the third- 
order determinants which were arrived at by crossing out the second row and 
jth column. Verify thus the expansion by cofactors for this special case. 

3-19. Evaluate 

(a) Expand by the first row; 

(b) expand by the second column. 


4 1 2 
7 3 5 
1 6 6 
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3-20. By actual computation show that |A| = |A'| when 

4 1 6 
|A| = 7 2 9 
3 0 8 

3-21. Using cofactor expansion by the first row, show that 

1 3 1 
7 0 7 = 0. 

6 4 6 

3-22. Show by computation that 

5 6 1 3 6 1 2 6 1 

523 = 423+123 
11 50 850 350 

3-23. Show by computation that 

4 6 0 4 6 0 

2 7 3 = 2+ 2(3) 7 3 

1 5 1 1 + 2(1) 5 1 

3-24. Show by computation that 

1 3 4 4 3 1 

731=— 137 

2 9 4 4 9 2 

3-25. Prove by induction that for any positive integer n (when operations 
are defined), 

(AiA 2 • • • A n y = A'.-.A^Ai'. 

Note: To prove by induction that a relation holds true for all positive integers n, 
we show first that the relation is true for n = 1. Then we demonstrate that, 
if the relation holds for n — 1, it is also valid for n. 

3-26. Prove that the only n X n matrices which commute with all other n X n 
matrices are scalar matrices. 

3-27. Consider a product of matrices defined as c,y = difin (suggested in 
this chapter and then set aside). What are the conditions under which multi- 
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plication is defined? Is the multiplication commutative, associative, and dis¬ 
tributive? 

3-28. Let 



Evaluate the polynomial X 2 A 2 + XiA + Xol when X 2 — 2, Xi 3, Xo 5. 
3-29. Show that if AB = 0 and B 5 * 0, there is no matrix C such that 

CA = I. 

3-30. Expand (A + B) 4 and (A - B) 4 . Be careful to note that A and B 

do not commute in general. ,, , ,, 

3-31. Prove that the determinant of the scalar matrix XI n is X n and that the 
determinant of a diagonal matrix is the product of the diagonal elements. 

3-32. A triangular matrix is defined as one where all elements below (or 
equivalently above) the main diagonal are zero. Such a matrix is square, and 
a*.- = 0 (t > j). Prove that the determinant of a triangular matrix is the 

product of the elements on the main diagonal. 

3-33. If two nth-order matrices A and B differ only in their jth column, prove 
^ 2 1 -“|A + B| = |A|+ |B|. 


3-34. Prove by induction that for the product of nonsingular matrices, 

(A1A2 • • • An) = An An—X • ’ A2 Al . 

3-35. Prove that the inverse of a nonsingular symmetric matrix is symmetric. 
3-36. Prove that the inverse of a nonsingular skew-symmetric matrix is 

skew-symmetric. . . . 

3-37. Prove that every skew-symmetric matrix of odd order is singular. 

3-38. If A + is the adjoint of a symmetric matrix A, prove that A + is sym- 
metric, that is, the cofactor of a ti is the same as the cofactor of a,-,. 

3-39. Show by an example that, in general, 


(A + B)- 1 ^ A -1 + B -1 . 


3-40. Using A -1 = (1/|A|)A + , compute the inverse of the following matrices: 




'4 1 2 



"2 1 

; (b) A = 

0 10 

; (c) A = 

5 f 

_3 4 


1- 

00 

C* 

1_ 


6 3. 


(d) A = (4). 


3-41. Show that the inverse of the scalar matrix S = XI is S j - (1/X)I. 
3-42. Show that the inverse of the diagonal matrix D = ||Xi 5<j|| is D — 

^3-43 Show that the inverse of a nonsingular triangular matrix T is triangular 
and, by considering T -1 T = I, obtain a set of equations which can be solved 
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sequentially to yield the elements of T -1 . Illustrate this by computing the 
inverse of 

“a n a \2 #13 
T = 0 a22 0,2 3 

_0 0 #33 

In particular, observe that if an is a diagonal element of T, and tn is the corre¬ 
sponding diagonal element of then tu - 1 /an. 

3-44. Show that interchanging rows in a nonsingular matrix A interchanges 
the corresponding columns of A -1 . 

3-45. If A is a given matrix and B is a nonsingular matrix (both of nth-order), 
prove that there is one and only one matrix X such that A = BX, and only one 
matrix Y such that A = YB. In fact, 

X = B -1 A; Y = AB- 1 . 


3-46. Compute the inverse of the partitioned matrix 

B=r B ° 

D 1 


when B is n X n, D is 1 X n, 0 is n X 1, 1 is 1 X 1. 

3-47. Compute by partitioning the inverse of the following matrix. Also 
find a product form of the inverse. 

"4 2 18" 

A - 7 9 4 3 • 

10 5 2 

.6 6 1 7_ 

3-48. Obtain a product form of the inverse of 

"0 1 2" 

A = 3 9 7 • 

2 1 6 

3-49. If B is written as a column of row vectors B = [b 1 , . . . , b m ], prove that 

BC = [b x C,.... b m C], 

when the multiplication is defined. 

3-50. Show that if A is the partitioned matrix 


~A„ 

M, 

then 

A' = [ A ' n 

1- 

◄ 

A 21 

A 22 J 


A '12 

A22J 
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3-51. Consider that the m X n matrix E mn is defined as a matrix all of whose 
elements have the value of unity. For example, the 2X2 matrix E 22 would be 



What is the matrix E nm A for any mX r matrix A? What is the matrix AE rn ? 
*3-52. Consider the matrix 


0.1 0.2 
0.3 0.2 


Evaluate (I — A) -1 . Eliminating the remaining terms in the expansion of the 
inverse, compute 1 + A + A 2 . 

3-53. Consider all operations defined for vectors and matrices and show in 
detail that there is a complete equivalence between vectors and matrices of one 
row or column. 

3 - 54 . How does a symmetric matrix simplify the procedure of computing the 
inverse of a matrix by partitioning (discussed in Section 3-20)? List the simpli¬ 
fications. 

3-55. Show that there is a complete correspondence between the scalar 
matrices XI n and the real numbers X. Consider addition, multiplication, in¬ 
verses, etc. A correspondence of the type XI n —* X is called an isomorphism. 
The systems represent indeed the same thing, except for notation. For example, 
show that if X 1 X 2 = X 3 , then XiI n (X 2 l») = X 3 l n . 

3-56. Consider the 1X1 matrices (X) containing the single element X. By 
examining all the rules for matrix operations, show that there is no difference 
between a 1 X 1 matrix (X) and the real number X, that is, we can write (X) = X. 

3-57. Show that if A is skew-symmetric, B'AB is also skew-symmetric. 

3-58. Find the matrix B whose inverse is 


B " 1 


4 3 6 

1 5 7 

2 9 1 


3-59. In Problem 3-58, we replace the first column of B by the vector 
a — [2, 0, 7]. Compute the inverse of the new matrix B w . 

3-60. Given the nonsingular matrix B = (bi, . . ., b n ). Show that 

B-'bi = e». 

3-61. If B is a nonsingular matrix, column r of B is e r , and row r of B is e' ry 
show that column r of B~ x is e r , and row r of B“ x is e' r . 


* Starred problems correspond to starred sections in the text. 
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*3-62. Suppose that a square matrix A can be written in partitioned form as 


A 


Ai 0 - 0 

0 A 2 • • • 0 > 


0 0 *A„ 


where the only nonzero submatrices are square and appear on the main diagonal. 
Prove that 

|A| = |Ai| |A 2 | • • • |A»|. 

*3-63. Expand by the Laplace expansion method, using the first and third rows 

3 4 15 
2 8 7 6 
10 5 4 
9 9 13 

*3-64. Show that an nth-order determinant can be written 


an • 

* CLln 

= aiiAn + 

0 

«21 

«12 * 

<122 * 

• ain 

• 0>2n 

a n 1 • ■ 

■ * a n n 


1 

a n2 * ' 

' ‘ Ann 


Each of the submatrices whose determinants yield the cofactors A 12, . . ., A\ n 
contain the elements 021, . • •, «ni of the first column of A. Denote the co- 
factor of an in A\j by Aiy ; ,*i (this is a determinant of order n — 2). Note that 
Aiyii involves the determinant of the submatrix formed from A by crossing out 
rows 1, i and columns 1, j, as does also An ; ij which, in turn, is the cofactor 
of Oij in A 11 . Thus 4n ; <y and Aiy ; ,i can differ only in sign. Show that 

All;,/ = — Al/;,1. 

Now prove that |A| can be expanded in the form 

|A| = aiiAn — EL a \/On Aii ; ,/ (h 3 7 * 1)* 

* / 

This is the famous Cauchy expansion of a determinant. 

*3-65. Generalize the Cauchy expansion discussed in Problem 3-64 so that 
the expansion is made in terms of column h and row k. 
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*3-66. By means of the Cauchy method expand in terms of the first row and 


column: 

12 3 4 
2 *0 7 1 

5 9 0 6 

6 112 

*3-67. Given: 



an 

ai2 

<*13 

014 

015 

A = 

021 

a 22 

<*23 

024 

025 


031 

032 

033 

034 

035 


&11 

&12 

&13 

&21 

&22 

&23 

&31 

&32 

&33 

&41 

&42 

643 

bsi 

652 

&53 


Write |AB| as a sum of products of 3 X 3 determinants taken from A and B. 

*3-68. Consider |AA'|. Using the theorem on the determinant of the product 
of rectangular matrices, show what form the expansion of this determinant 
takes when A is m X n, m < n. 

*3-69. A matrix has no numerical value. On occasions, however (for example, 
when treating infinite series of matrices), it is useful to define a real-valued func¬ 
tion of a matrix. This can be done in many ways, e.g.: The modulus of an 
m X n matrix A, written Af(A), is defined as 



where 

|a<y| is the absolute value of a»y. 
Find il/(A) for the following matrices: 


2 4 3‘ 

-7 0 2 

> 

11 

o' 

rfv 

CO 

II 

< 

0 o' 

0 0 

> 

II 

"0 0 ' 

3 12 


2 1 5. 


0 

--I 


_o 0 . 


*3-70. Referring to Problem 3-69, prove that ilf(A) satisfies the following 
expressions: 

(1) Af(XA) = |X|M(A) for any scalar X; 

(2) M(l) - 1; 

(3) M(AB) < M(A)M(B); 

(4) M (A + B) < M{ A) + M(B); 

(5) M(A) - Af(B) < M (A + B); 

Hint: For (5) write A = (A + B) + (-B) and apply (4) to this equation. 
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*3-71. Given an infinite sequence of matrices A*. Prove that if 


then 


and 


lim M (A — A*) = 0, 


lim A* = A, 

*—>oo 

lim M(k k ) - M(A). 
*—►00 


*3-72. Express (A" -1 ), |A+| in terms of |A|. 

*3-73. Consider an m X n matrix A such that every column of A contains 
either only zero elements or zero elements throughout except one which has the 
value unity. Show that every minor of A either has the value 0 or dbl. 

3-74. Given AB = AC; does it follow that B = C? Can you provide a 
counterexample ? 

3-75. If the matrix A of Problem 3-62 is nonsingular, compute A -1 . 

3-76. Derive the cofactor expansion by column j from the result on the ex¬ 
pansion by row j , using the fact that |A| = |A'|. 

*3-77. In the general expansion of a determinant of order 4, find all the terms 
containing 011022 and 012021 * Show that, except for sign, the same set of terms 
is obtained in each case after 011022 or 012021 is factored out. Show that the 
set of terms from which 012021 is factored out has -the opposite sign from that 
corresponding to ana 22 * Next, show that the set of terms which multiplies 
0H022 is the determinant obtained from A by crossing out rows and columns 
1, 2. Do the same thing for terms containing 03 * 04 * where ( u , v) is (1, 2) or 
(2, 1). Verify that the sign is obtained according to the rules for the Laplace 
expansion. 


Problems Involving Matrices With Complex Elements 

3-78. Review the chapter and list the most important results. Show that 
the results remain true for matrices whose elements are complex numbers. 

3-79. Compute AB, BA for 


A = 

2 + i 4 + 3* i 

; » = 

1 ° 
00 

1_ 

5 i 

—2i 


.7 6+9* 1 — 


3 + 2 i 

4 + 5 i_ 


3-80. Compute |A|, A - ' 1 for 

A=[ 1 + i 2i 1 
[4 — 6 i 2 + 3iJ 

Show that AA _1 = A -1 A = I. Note that the definition of the identity matrix 
does not need to be changed when the theory is generalized to include matrices 
with complex elements. 
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3-81. Compute |A|, A 1 for 

3 i 5 6 + 7t 

A = 4+2 i 1 + i 3 — 5 i 

3 2 — i 6 + i 

3—82. Compute a product form of the inverse for the A of Problem 3—80. 
3-83. Given a matrix A = ||aj| with complex elements, then the matrix 
A* is defined to be A* = ||ajy||; A* is formed from A by taking the complex con¬ 
jugate of each element in A. Hence A* may be called the conjugate matrix to A. 
When the elements of a matrix are complex, symmetric matrices are of less 
interest than so-called Hermitian matrices. A Hermitian matrix is a matrix A 
such that A = (A*)'. The matrix is equal to its conjugate transpose. Note that 
if the elements are real, a Hermitian matrix is a symmetric. The matrix (A*)' 
will be denoted by A. For any matr ix A, A is_called the associate matrix of A. 
Prove that (ABC)* = A*B*C* and (ABC) = C B A. 

3-84. Show that the following is a Hermitian matrix: 

2 1 + i 3 + 4 i 

A = 1 — i 4 2 — 3 i 

3 — 4 i 2 + 3 i 5 

Prove that the diagonal elements of a Hermitian matrix are always real. 

3-85. Compute A 2 for the matrix A of Problem 3-84. If A is a Hermitian 
matrix, what can be said about the elements of A 2 ? 

3-86. prove that the inverse of a nonsingular Hermitian matrix is Hermitian. 
Under what conditions is the product of two Hermitian matrices Hermitian? 
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LINEAR TRANSFORMATIONS, 

RANK AND ELEMENTARY TRANSFORMATIONS 


Be ye transformed by the renewing of your mind. 

Rom. XII- 2 . 


4-1 Definition of linear transformations. Transformations of variables 
in which the new variables are linear combinations of the original ones 
are often needed in working with linear models. In fact, transformations 
of this type were used in the preceding chapter to obtain the general 
definition of matrix multiplication. These linear transformations of vari¬ 
ables and matrix theory are closely connected. Furthermore, linear trans¬ 
formations can be given an interesting geometric interpretation which, in 
turn, contributes materially to our intuitive understanding of many of the 
properties of matrix operations. We shall see also that a more detailed 
discussion of linear transformations will allow us to develop some addi¬ 
tional significant concepts in matrix algebra. 

An especially simple example of a linear transformation of variables is 


2/i = diixi + a l2 x 2 

or y — Ax. 

2/2 = 021^1 + G 22 X 2 


(4-1) 


The vector x = [x x , x 2 ] can be viewed as a point in the Xix 2 -plane. 
The change of variables (4-1) serves to transform each point in the x x x 2 - 
plane into a point in the i/i2/2-plane. A transformation of this kind is 
called a mapping of the x x x 2 -plane into all or part of the y x y 2 - plane. The 
vector x is mapped into the vector y. The point y is called the image of x. 
The matrix A induces the mapping. 

There is no reason at all for considering the 2 /i 2 / 2 -plane as necessarily 
different or distinct from the XiX 2 -plane. They can be assumed to be 
identical, with the y x - and y 2 -axes being the same as the respective x x - 
and x 2 -axes. Hence, the x x x 2 - and 2 /i 2 / 2 -planes can be considered to be 
simply different names for the same thing; according to this interpreta¬ 
tion, the transformation (4-1) moves a point in the x x x 2 -plane into an¬ 
other point in the x x x 2 -plane. 
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definition of linear transformations 

Linear transformations of the type (4-1) have some interesting prop- 
erties. If 

Yi = Axi, y 2 = Ax 2 , 

then 

yi —|— y 2 = Axi + Ax 2 == A(xi -b x 2 ). (4 

If yj, y 2 are the images of x 1( x 2 , respectively, then yi + y 2 is the image 
of + x 2 , that is, the operation of addition is preserved under the 
transformation. Similarly, if yi is the image of x 1( then Xyj is the image 
of Xxi since A(Xxx) - XAx x . Multiplying x by a scalar also multiplies the 
image of x by the same scalar. The transformation preserves multiplica- 

tion by a scalar. . 

If, in (4-1), A, the matrix of the coefficients, is nonsmgular, we can 

write x = A^y. This means that one and only one point in the XiX 2 - 
plane corresponds to each point in the y i?/ 2 -pl ane > just as according to 
Eq. (4-1), one and only one point in the ?/iy 2 -plane corresponds to each 
point in the XiX 2 -plane. Such a transformation is called a one-to-one (1-1) 
transformation since only one y corresponds to a given x and only one x 
to a given y. A transformation can be single-valued (that is, only one y 
corresponds to a given x) without being 1-1, since there may be two or 
more points x which are transformed into the same y. If A exists, then 
every point y has a corresponding unique x. Consequently, A maps the 
xix 2 -plane into all of the y^-plane. Assuming that the xix 2 -plane and 
the i/i?/ 2 -plane are identical, we see that a nonsingular linear transforma¬ 
tion maps E 2 onto E 2 (all of E 2 , not just a part of E 2 ) in a 1-1 manner. 

The ideas developed above carry over to the transformation y — Ax 
when AiswXn. In this case, each point of E n is transformed or mapped 
into a point in E m . (Of course, if to ^ n, then y, x cannot be considered 
to be points in the same space.) Again, addition and multiplication by a 
scalar are preserved under the transformation. Mathematicians prefer 
to define the concept of a linear transformation abstractly in terms of the 
properties of preserving addition and multiplication by a scalar. 

Linear Transformation: A linear transformation T on the space E n 
is a correspondence which maps each vector x of E n into a vector^ T(x) of 
E m (to. can be >, —, < n) such that for all vectors Xj, x 2 in E n and all 
scalars Xj, X 2 , 

T(\iXi -f X 2 x 2 ) = XiT(xx) + X 2 T(x 2 ). (4-3) 

Equation (4-3) expresses the preservation of both addition and multipli¬ 
cation by a scalar. If we set Xj = X 2 = 1, (4-3) becomes 


T(xx + x 2 ) = T(Xi) + T(x 2 ); 
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i.e., the transformation preserves addition. If we take X 2 = 0, (4-3) 
becomes 

TiXiXy) = \ x T(t x ); 

i.e., the transformation preserves multiplication by a scalar, which is often 
referred to as the homogeneity property of linear transformations. 

Any matrix transformation is a linear transformation since the rules 
for matrix operations establish that 

A(XiXx + X 2 x 2 ) = XjAxj + X 2 Ax 2 . 

In fact, the algebra of matrices is often called the algebra of linear trans¬ 
formations. We shall always identify linear transformations with matrix 
transformations. The general definition might give the impression that 
there could exist linear transformations on vector spaces other than 
matrix transformations. This is not so. In Problem 4-25 you will be 
required to prove that every linear transformation on a vector space is 
equivalent to, and can be represented by, a matrix transformation. We 
shall never have any direct use for the abstract definition (4-3) of a linear 
transformation. However, (4-3) does indeed provide the general defini¬ 
tion of linearity and can be used to define linear differential or difference 
equations, linear servomechanisms, etc. In general, a physical or economic 
model can be cast into the form T(x) = y. This simply means that a set 
of variables, described by the vector x, is related to another set of variables 
or known parameters by the transformation T. The model is linear if 
(4-3) holds. We often refer to T as an operator which transforms x into y. 
The operator T may involve also other variables, such as time (when 
T(x) — y is a set of differential equations and x is to be determined as a 
function of time), etc. 

Examples: (1) The transformation y = ax is linear. To prove this, 
we only need to show that (4-3) holds. In this case, T(x) = ax. Thus, 

T(\iXi + X 2 x 2 ) = a(Xi^i + X 2 x 2 ) = Xi(axi) + X 2 (ax 2 ) 

= XiT(: n) + \ 2 T(x 2 ). 

(2) The transformation y = ax 2 is not linear, since 

T(\\Xi + X 2 x 2 ) = + X 2 x 2 ) 2 = a(XiXj) 2 + a(\ 2 x 2 ) 2 + 2aXiX 2 Xia: 2 

9 ^ \\T(x i) + \ 2 T(x 2 ) = aXxx 2 + a\ 2 x |. 

(3) The transformation y — a x x + a 2 , a 2 ^ 0, is not linear, since 

T(^i#i X 2 :r 2 ) = «i(Xi.ri + \ 2 x 2 ) + a 2 = aiXxXi + a{\ 2 x 2 + a 2 

7 * \iT(x x ) + \ 2 T(x 2 ) = X^a^i + a 2 ) + X 2 (aiX! + a 2 ) 
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for all Xi, X 2 . The constant a 2 spoils the linearity of the transformation. 
It should be recalled that no such constants appear in the general matrix 
transformation y = Ax. For a transformation to be linear, every con¬ 
stant must multiply a variable. 


4-2 Properties of linear transformations. 

Domain and range: The domain of a transformation is defined to he 
the set of elements which undergo transformation. The range of a trans¬ 
formation is the set of elements which is formed by the transformation 
operating on the elements in the domain. 

The range is often called the image of the domain under the trans¬ 
formation. In Eq. (4-1), the domain is the whole £iz 2 -plane, and the 
range is the set of points in the ^i?/ 2 -plane which is the image of the 
xix 2 -plane. When A is a nonsingular, the range is the whole 2 / 12 / 2 -plane. 

The range of a linear transformation on E n is a subspace of E 171 or, ex¬ 
pressed in terms of a matrix transformation: If A is an m X n matrix, then 
the set of points y = Ax (for all x in E n ) is a subspace of E™. In general, 
for any linear transformation T y we must demonstrate that if T(x) is in 
the range, so is \T(x) for any scalar X. This can be shown, since XT(x) = 
T(Xx) and T(\x) is the image of Xx and is in the range. Similarly, it must 
be true that if T(xj), T(x 2 ) are in the range, the sum T(x 1 ) + T(x 2 ) is 
in the range also. Since T(x x ) + T(x 2 ) = T(x 1 + x 2 ) is the image of 
X\ + x 2 , it is in the range. It may happen, of course, that the subspace 
of E m is E™ itself. A simple example will illustrate the implications of this 
theorem: Suppose A is 3 X 3. Then the set of points y = Ax for all x in 
E 3 must be either the origin, a line through the origin, a plane through 
the origin, or all of E 3 . In addition, this proof shows that a linear trans¬ 
formation which takes points in E n into points in E m also takes a sub¬ 
space of E n into a subspace of E” 1 . 

Any m X n matrix A can be written as a row of column vectors, 
A = (ai,. . ., a n ). Thus, when a matrix A maps all of E n into E m y we have 

y = Ax = xi&i + • • • + x n a n , (4-4) 

where the Xi can take on all possible values. The range of the transformation 
that is y the subspace generated by the y, is then the subspace of E m spanned 
by the columns of A. We know from Section 2-13 that the dimension of 
this subspace is the maximum number of linearly independent columns 
in A. Thus the dimension of the range is the maximum number of linearly 
independent columns in A. 
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Examples: ( 1 ) The transformation induced on E 4 by 


can be written 


A = 


1 5 2 
0 0 3 
0 0 1 


4 

6 

2 



The range of the transformation is the two-dimensional subspace of E 3 
spanned by [1,0,0] and [2,3, 1]; that is, the range consists of all the 
vectors y lying in the plane through the origin and [1, 0, 0], [2, 3, 1] in E 3 . 
( 2 ) The transformation represented by the matrix 


A = 


1 0 0 “ 
0 1 0 _ 


takes the point [x\ y x 2) £ 3 ] into a point [x\ y x 2 ] in the £i£ 2 -plane. The 
transformation then projects a point in E 3 on the £i# 2 -plane> The trans¬ 
formation is not 1 - 1 , since every point with the first two components 
x\, x 2 goes into [x\, x 2 ] regardless of what x 3 happens to be. Thus A takes 
E 3 into all of E 2 . 

(3) The transformation represented by the matrix 


A = 


1 

0 

0 


0 

1 

1 


takes any point [x\ y x 2 ] of E 2 into [x\, x 2} x 2 ] of E 3 (see Fig. 4-1). In 
the process, the transformation only rotates the ^i^-plane about the Xi- 
axis through a 45°-angle. The range is a plane and represents a two- 
dimensional subspace of E 3 . 

It is important to note that a transformation which takes every point 
of E n into a space of higher dimension E m (m > n) can never have a range 
of dimension m. We understand intuitively that all of E m cannot be 
filled from a space of lower dimension. The dimension of the subspace 
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Figure 4-1 


of E m , representing the range of the transformation, cannot be greater 
than n. The dimension of the subspace is the maximum number of linearly 
independent columns in A, and A has only n columns. 

Let T\ be a linear transformation which takes E n into a subspace of E r , 
and T 2 a linear transformation which takes E r into a subspace of E m . 
The product T 3 = T 2 T X of the two linear transformations T Xl T 2 is 
defined as follows: 

T s (x) = T 2 [T 1 (x)); ( 4 - 5 ) 

that is, we obtain r 3 (x) by computing 7h(x) = y and applying T 2 to y. 
The product of two linear transformations is also a linear transformation 
since 

^(XiXi -j- X 2 X 2 ) == 7 7 2 [Xi7'i(x 1 ) X2Tx(x2)] 

= \1T2ITM] + X 2 T 2 [T 1 (x 2 )] ( 4 - 6 ) 

= Xi7 7 3(x 1 ) + X2T 3 (X2). 

If Ti takes a point in E n into a point in E r y and T 2 takes a point in E r 
into a point in E m , then T s = T 2 T X takes a point in E n into a point in E m . 

Given any two linear transformations T x , T 2j with T x taking points in 
E n into points in E r , and T 2 taking points in E 8 into points in E™, the 
product T 2 T x can be defined if and only if r = s. If matrices A x , A 2 
represent T ly T 2y respectively, then T s is represented by matrix A 3 = 
A 2 A 1 . This statement will have to be proved in Problem 4-29. Thus, a 
product of matrices can be viewed as a sequence of linear transformations. 
At this point, we wish to note that the product of two linear transforma¬ 
tions defines the rules for matrix multiplication. Indeed, in Section 3-3, 
matrix multiplication has been defined in terms of the product of two linear 
transformations. 
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4-3 Rank. The dimension of the range of a linear transformation repre¬ 
sented by A is the maximum number of linearly independent columns 
in A; as such, it tells us quite a bit about the linear transformation and 
the matrix A. We shall call the number of linearly independent columns 
in A the rank of A. 

Rank: The rank (or more precisely the column rank ) of an m X n matrix 

A, written r( A), is the maximum number of linearly independent columns 

in A. 

We shall see that, for many purposes, one of the most important aspects 
of a matrix is its rank. Introduction of this concept enables us to tie up 
a number of loose ends in matrix theory and to develop some basic ideas 
in greater detail. 

Example: The rank of an nth-order identity matrix is n since I = 
(ei,. . ., e n ), and the e* are linearly independent. 

We are frequently faced with the problem of determining the rank 
of the matrix C = AB from the given ranks of the matrices A, B. In 
general, the rank of C is not uniquely determined by the ranks of A, B; 
however, the following inequality holds: 

r(AB) < min[r(A),r(B)]; (4-7) 

that is, the rank of the product AB of two matrices cannot be greater than the 
smaller of the ranks of A, B. 

The truth of (4-7) becomes evident if we think of A, B as representing 
linear transformations. Let us suppose that A is m X r and B is r X n. 
The product AB can be viewed as a single linear transformation taking 
E n into a subspace of E m , and also as two linear transformations applied 
sequentially. First, B takes E n into a subspace of E r , then A takes this 
subspace of E r into a subspace of E m . From a geometrical point of view, 
it is clear that the dimension of the subspace of E m cannot be greater 
than the smaller of the dimensions of: (1) the subspace of E r obtained 
by B transforming E n , (2) the subspace of E m that would result if A 
transformed all of E r . 

Let us now prove Eq. (4-7) analytically. We write 

z = Bx. (4-8) 

Then 

r 

y = ABx = Az = Zita. (4-9) 

i=l 

Thus all vectors y in the subspace of E m are linear combinations of the 
columns of A. Hence r(AB) cannot be greater than r(A), for otherwise it 
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would be impossible to express every vector y as a linear combination of 
the columns of A. 

Next, we note that every z can be written as a linear combination of 
r(B) = k columns from B. Let the set of k linearly independent columns 
be denoted by Si, , ft k- Hence 

z = X) (4-10) 

i— 1 

for every z in the subspace of E r . Using Eq. (4-10), we can write every y 
in the subspace of E m as 

y = ^2 otiAbi , (4-11) 

*—l 

that is, every y is a linear combination of k vectors. Although we assumed 
that the ft; were linearly independent, this is not necessarily true for the 
Aft*. Hence r(AB) < r(B), and Eq. (4-7) is proved. 

The problems at the end of the chapter will show that in some cases 
the strict equality and in others the strict inequality sign will hold in 
Eq. (4-7). However: 

If a matrix of rank k is multiplied in either order by a nonsingular matrix , 
the rank of the product is k. This can be proved easily from Eq. (4-7). 
Let r(AB) — R; assume that matrix A is nonsingular and that r(B) = k . 
From (4-7) 


R < k. 

(4-12) 

But 

B = A _1 (AB). 

(4-13) 

Applying (4-7) to (4-13), we obtain 

k < R. 

(4-14) 

Comparison of (4-12) and (4-14) yields 

k= R. 

(4-15) 


If B is nonsingular and r(A) — k, the proof can be established in exactly 
the same way. 

From the preceding result we immediately see that all nth-order non¬ 
singular matrices have the same rank. Let A, B be two nth-order non¬ 
singular matrices, and r(A) = fc x , r(B) = k 2 . From (4-15) 

r(AB) = ki, r(AB) = k 2 ; 


hence 


k x — k 2 . 


(4-16) 
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4-4 Rank and determinants. An interesting relation exists between 
the rank of a matrix A and the order of its non vanishing minors, which is 
of considerable theoretical importance and, in addition, provides a means 
for computing the rank of a matrix. 

The rank of an m X n matrix A is k if and only if every minor in A of 
order k + 1 vanishes y while there is at least one minor of order k which does 
not vanish. To prove the necessity, assume that r(A) = k. Then any 
k + 1 columns of A are linearly dependent, and any column in A can be 
expressed as a linear combination of some set of k columns. Hence, the 
columns of any submatrix of order k + 1 can be expressed as linear 
combinations of k columns from A. Each of the determinants obtained 
by expanding the determinant of this submatrix vanishes because its 
associated matrix has two identical columns, and hence the determinant 
of the submatrix vanishes. This holds true for every submatrix of order 
k + 1, that is, all minors of order k + 1 vanish. 

Next, we must show that there is at least one minor of order k which 
does not vanish. Assume that the opposite holds, that is, that all de¬ 
terminants of order k vanish. Select k columns from A which are linearly 
independent. Without loss of generality, we can consider them to be the 
first k columns. Assume that all determinants of order k in the first k 
columns and, in particular, the determinant of the submatrix formed from 
rows 1, . . . , k y vanish. Thus, expanding in cofactors by row k, we find 

a kiAki = o. (4-17) 

i 

The same cofactors are obtained if we form a k X k submatrix from the 
first k — 1 rows and any other row j y j = k + 1, . . ., m y and expand 
by row j. This implies that 

Y] ajiAki = 0, j = k + 1, . . . , m. 

i 

Furthermore, 

^ , dji-Ahi = 0, j = 1, .. ., k 1, 

» 

since we are expanding by one row and are using the cofactors of another. 
Combining (4-17), (4-18), and (4-19), we obtain 

A ki m = 0. (4-20) 

i 

This implies that columns a x , . . . , a* are linearly dependent if the Am 
are not all zero. In fact, if there were any minor of order k — 1 in the first 
k columns, which did not vanish, we could rearrange the rows and columns 
so that at least one Am in (4-20) would be different from zero. If all 
determinants of order k — 1 formed from the first k columns of A were 


(4-18) 

(4-19) 
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zero, we could repeat the same procedure with k — 1 columns and show 
that the k — 1 columns are linearly dependent if the determinants of 
order k — 2 do not vanish. 

Conceivably, we may in this way arrive at a point where all deter¬ 
minants of order 2 vanish. Now we have reduced the problem to showing 
that any set of two columns is linearly dependent. This linear dependence 
would follow even if all determinants of order 1 vanished, since then the 
columns would be composed of zeros only. Thus, using the above pro¬ 
cedure, we have contradicted our assumption that the k columns were 
linearly independent. Hence, there must be at least one nonvanishing 
determinant of order k in any set of k linearly independent columns. 
Thus we have proved that if r(A) = fc, then all determinants of order 
k + 1 in A vanish, and that there is at least one determinant of order k 
which does not vanish. To prove the sufficiency, let us assume that all 
determinants of order k + 1 vanish and that there is one determinant of 
order k which does not vanish. The proof of the necessity showed that any 
k 1 columns whose determinants of order k + 1 all vanish are linearly 
dependent. Hence r(A) < k. Now let us consider the columns associated 
with any determinant of order k which does not vanish. These columns 
cannot be linearly dependent or, as shown in the necessity, all deter¬ 
minants of order k formed from them would vanish. Hence r(A) = k. 

Note: If all determinants of order k + 1 in A vanish, all determinants 
of order k + r also vanish (r > 1). Cofactor expansion immediately 
demonstrates the correctness of this statement. 

The above result gives us a means for computing the rank of any matrix: 
We look for the largest nonvanishing determinant in A; the order of this 
determinant is the rank of A. To find nonvanishing determinants, it is not 
necessary to consider only adjacent rows or columns. Any rows and 
columns can be chosen to form the determinant. 

We have now a way of testing whether any given set of m vectors is 
linearly independent. A matrix is formed with the vectors as columns. 
The rank of this matrix is found by the method just described (or by any 
other method). If the rank is m } then the vectors are linearly independent. 
If the rank is k < m y the vectors are not linearly independent; in addition, 
we have determined the maximum number of linearly independent vectors 
in the set. 


Examples: 
(1) A = 


(2) A = 


1 3 

4 2 

2 3 


has rank 2 since 


1 1.5 


has rank 1 since 


1 3 

4 2 
2 3 

1 1.5 


^ 0 . 


= 0, but there are deter¬ 


minants of order 1, let us say |2|, which do not vanish. 
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(3) A = 


'0 

0 


0 

0 


:] 


has rank 0 because 


all elements of A vanish. 


(4) A = 


2 

3 

2 


0 7 

3 6 has rank 2 since |A| = 0, but there are some deter¬ 
minants of order 2 which do not vanish 

2 4 


The rank of a matrix has been defined as the number of linearly inde¬ 
pendent columns of A. Strictly speaking, this is a definition of the column 
rank, since we may equally well speak of a row rank of A and define it as 
the maximum number of linearly independent rows in A. The reader may 
feel that we should have been more meticulous, that is, used the term 
“column rank” whenever we referred to the “rank” of a matrix. We 
shall now prove that the row rank of A is equal to the column rank and hence 
the maximum number of linearly independent columns is equal to the maxi¬ 
mum number of linearly independent rows. The rank of a matrix is thus a 
unique number which can be found by computing the maximum number of 
linearly independent rows or columns. For this reason, it was not neces¬ 
sary to distinguish between the two types of rank. 

To prove this equality, let us suppose that the column rank of A is A: 
and the row rank of A is k'. But the row rank of A is equal to the column 
rank of A'. Since, by assumption, the column rank of A' is k’, all minors 
of order k' + 1 in A', and hence in A, vanish. Thus k' > k. However, 
there is at least one minor of order k' from A', and hence from A, which 
does not vanish. Thus k > k' and, therefore, k' = k. We have proved 
that the row rank of A is equal to its column rank. 

Let us summarize the results of our study of the concept of rank: If 
r (A) = k, then at least one set of k columns and rows of A is linearly inde¬ 
pendent (there may be, of course, a number of sets of k rows and columns 
in A which contain linearly independent vectors), and no k + 1 rows or 
columns are linearly independent. Furthermore, there is at least one de¬ 
terminant of order k in A which does not vanish. All determinants of 
order k + 1 do vanish. 

A particularly interesting case is presented by an nth-order matrix A. 
If A is nonsingular, |A| ^ 0. Hence, the columns (or rows) of A are 
linearly independent and form a basis for E n , and r(A) = n. Conversely, 
if we have n linearly independent vectors from E n , we can form a matrix 
A with the vectors as its columns. This matrix A will have rank n, and thus 
|A| 0; A is nonsingular and has an inverse. 

The results of this section and of Section 4—3 can be used to clear up one 
question left unanswered in Chapter 3. There we stated, but did not 
prove, the fact that only nonsingular matrices have inverses; that is, 



4 - 4 ] 


RANK AND DETERMINANTS 


143 


if A is an nth-order matrix and there exists an nth-order matrix B such 
that AB = I n , then A is nonsingular. In the preceding section, we have 
shown that r(I n ) = n, so that if AB = I n , then r(A) = n, and r(B) = n. 
In this section, we have seen that if r(A) = n, |A| s* 0, and hence A is 
nonsingular. Thus we have proved that only nonsingular matrices have 
inverses. We have also demonstrated that if the product of any two 
nth-order matrices yields the identity matrix, both matrices are non- 
singular. It then follows from this result and the proofs furnished in 
Chapter 3 that if two nth-order matrices A, B satisfy AB = I n , then 
(1) BA = I n ; (2) A" 1 = B, B" 1 = A; (3) |A| ^ 0, |B| * 0. 

It is now clear how to express any n-component vector b as a linear 
combination of n linearly independent vectors ai,.. ., a n which form a 
basis for E n . We wish to find the yi such that 

n 



o* 

II 

M 

2 s 

sfr 

«*. 

(4-21) 


»=i 


We define 

A (&1, • • • t y \.Vli • • • t 2/nl* 

(4-22) 

Then 

b = Ay, 


or 

y = A -1 b. 

(4-23) 

The vector y is found by premultiplying b by A 1 . 



Example: Write b as a linear combination of ai, a 2 when 
b = [3,2], ai = [4,1], a 2 = [2,5]; 



This result is easily verified: 

El-(a) [:]*©[;] 


The method outlined above for expressing a vector in terms of a set of 
basis vectors is especially convenient if A -1 is available. 

In the following two sections, we shall develop a more efficient procedure 
for computing the rank of a matrix and, in addition, gain some interesting 
theoretical information. 
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4-5 Elementary transformations. There are simple operations which 
can be performed on the rows and columns of a matrix without changing 
its rank. By performing such operations, it is possible to convert a matrix 
into one whose rank can be read off by simply looking at the matrix. 
These operations then provide another way to compute the rank of a 
matrix; as a matter of fact, they yield a rather efficient numerical pro¬ 
cedure. In addition, they lead to some interesting theoretical results. 

Three types of operations on the rows of a matrix (called elementary 
row operations) are of importance; they are: 

(1) Interchange of two rows; 

(2) Multiplication of a row by any scalar X ^ 0; 

(3) Addition to the ith row of X times the yth row (X any scalar, and 

j ** *)■ 

These elementary row operations have one especially interesting prop¬ 
erty : They can be performed on the matrix A by premultiplication of A 
by a matrix E; that is, if B is obtained from A by some elementary row 
operation on A, then there exists a matrix E such that B = EA. 

Examples: (1) Exchange the first and third rows in A, 


A = 


2 

3 

5 


1 

4 

6 


consider 



i 

o 

o 


i 

1—i 

o 

o 


"2 1 


'5 6' 

E = 

0 1 0 

II 

< 

W 

0 1 0 


3 4 

= 

3 4 


i 

o 

o 

i-H 

_1 


i 

o 

o 

_i 


_5 6 


2 1 


Premultiplication of A by E interchanges the first and third rows in A. 
(2) Multiply the second row of A by 5. If 


"l 

0 

o' 


' 2 

r 

0 

5 

0 

; EA = 

15 

20 

_0 

0 

1 


_ 5 

6 


Premultiplication of A by E multiplies the second row of A by 5. 
(3) Add twice the second row to the third row. When 


1 

0 

o' 


' 2 

f 

0 

1 

0 

II 

< 

w 

3 

4 

_0 

2 

1_ 


_11 

14_ 


E = 
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Premultiplication of A by E adds twice the second row of A to the third row. 
Careful study of the E matrices will reveal that they can be obtained 
from the identity matrix by performing on it the appropriate elementary 
operation. 

Our simple examples have shown that the elementary operations can 
be performed on A through premultiplication by a matrix E; next, we shall 
demonstrate the general applicability of this procedure. First, if E is to 
perform an elementary operation on A, then EA must have the same 
number of rows and columns as A; hence E must be square. Let us suppose 
that we wish to interchange rows z and j of the m X n matrix A. If e» 
is the transpose of e*, then 

eJA = a* (a* is the zth row of A). (4-24) 

If, in the identity matrix I m , rows z and j are interchanged, then e t - appears 
as row j and e'- as row z. For the other rows, e' k appears as row k. Assum¬ 
ing E to be the identity matrix with rows z and j interchanged, we see 
that EA does indeed merely interchange rows z and j in A. In general, we 
see that the matrix E wdiich, when postmultiplied by A, interchanges the 
zth and jth rows in A, is the matrix obtained by interchanging the zth 
and jth row in I. The matrix which induces an interchange of rows z and j 
in A will be denoted by E<y. 

If we wish to multiply the zth row of A by X ^ 0, it is immediately 
obvious that the operative matrix E is the identity matrix whose zth row 
is multiplied by X; that is, e- is replaced by XeJ. The matrix which, when 
postmultiplied by A, multiplies the zth row of A by X will be denoted by 
E*(X). Again E*(X) is found by performing the appropriate elementary 
operation on the identity matrix. 

Finally, consider the addition of X times row j of A to row z ( j ^ z). 
Observe that 

(e' t - + Xe'*)A = a 1 + Xa y . (4-25) 

The matrix E which performs the required operation is simply the 
identity matrix with row z replaced by e t - + Xej. Once again, E is found 
by applying the proper elementary transformation to the identity matrix. 
The matrix E which, when postmultiplied by A, adds X times row j of A to 
row t will be denoted by E t (X| j). 

The preceding paragraphs have shown that any elementary row opera¬ 
tion can be performed on a matrix simply through multiplication by a 
matrix E which, in turn, is obtained by applying the appropriate operation 
to the identity matrix. The matrices Eij y E;(X), E(X|j) are called ele¬ 
mentary matrices. 

The elementary matrices E*y, E t (X), E t (X|j) are nonsingular and inverses 
may be easily obtained as follows; If we interchange rows z and j in E ij, 
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an identity matrix results. Hence 


E.yE.y = I 


(4-26) 


When the ith row of E,(X) is multiplied by 1/X, the identity matrix is 
obtained: 

E. Q E,(X) = I or Er‘(X) = E, (0 (4-27) 

If X times row j is subtracted from the tth row of E t (X|y), the identity 
matrix results. Hence, 

Ei(-X|i)Ei(X|i) = I or E-^XIj) = Ei(-X|j). (4-28) 

The inverse of an elementary matrix is an elementary matrix. 

Examples: Suppose that: 

"l 0 0" 

(1) E23 =001; 

0 10 


according to (4-26), 


E23 1 — Eo 


10 0 10 0 


0 1 0 0 1 0 


1 0 0 


00 100 1 = 0 1 0 


0 0 1 


(2) Ei(X) 


X 0 0 
0 1 0 
0 0 1 I 


according to (4-27), 


X- 1 0 0i 


Er'oo = 0 


00 x 00 


1 0 0 


1 00 1 0 = 0 1 0 


010 0 1 


001 
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(3) E,(4|2) = 


4 0 
1 0 


0 0 1 


according to (4-28), 


Er 1 (4|2) = 


-4 0 
1 0 
0 1 


1 -4 O' 


1 

H-i 

O 


-1 

HA 

0 

0 

0 10 


0 1 0 

== 

0 1 0 

1 

i-H 

O 

O 


T—< 

O 

O 

_I 


_0 0 1_ 


the rule gives the proper Ei 1 (4|2). 

Elementary column operations on a matrix can be defined in the same 
way as row operations, that is, they are: ( 1 ) interchange of two columns; 
(2) multiplication of column i by a scalar \ ^ 0; (3) addition of X times 
column,; to column i (j i). These operations can be performed on any 
matrix A through postmultiplying A by a matrix F. The matrix F is found 
by performing the required elementary operations on the columns of the 
identity matrix. F t y will denote the elementary matrix which, when pre¬ 
multiplied by A, interchanges columns i, j of A. The symbol F t (X) will be 
used to denote the elementary matrix which, when premultiplied by A, 
multiplies the zth column of A by X. The elementary matrix which, when 
premultiplied by A, adds X times column j to column i of A, will be de¬ 
noted by Fi(\\j). The elementary row matrices are nonsingular; so are 
the elementary column matrices F t y, F t -(X), F t (X|j). 

Example: Find the matrix F which, when premultiplied by A, inter¬ 
changes columns 1 and 3 and multiplies column 2 by 2: 


A = [ 3 
_1 


2 1 
5 4 


Matrix F 13 which interchanges columns 1 , 3 is obtained by interchanging 
columns 1 and 3 in the identity matrix: 


F13 


0 0 1 
0 10 - 
10 0 
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Matrix F 2 ( 2 ) which multiplies column 2 of A by 2 is found by multiply¬ 
ing column 2 of I by 2 : 

~1 0 Ol 


F 2 (2) - 


0 2 0 
0 0 1 


Hence the required matrix F is F 13 F 2 (2), or 



'o 0 f 


"l 0 0“ 


1- 

I— I 

o 

o 

1_ 

F — Fi 3 F 2 (2) = 

0 1 0 


0 2 0 


0 2 0 


1 

O 

O 

T—I 

_1 


—i 

o 

o 


1 0 0. 




"0 

0 

r 





- 1 

00 

to 

f 


0 

2 

0 

— 

"l 

4 

3" 

Li 5 

4 



0 

0 _ 


4 

10 

1 _ 

L 



1 






AF = 


The matrix F does indeed perform the desired elementary column opera¬ 
tions; it is obtained by applying the elementary column operations to the 
identity matrix. 


4-6 Echelon matrices and rank. By means of a series of elementary 
row operations, any m X n matrix A can be reduced to a so-called echelon 
matrix which has the following structure: 

( 1 ) The first k rows, k > 0, are nonzero (that is, one or more elements 
in the row are not zero) and all the elements of the remaining m — k 
rows are zero. 

( 2 ) In the fth row, i = 1, . . . , k (if k > 1 ), the first nonzero element 
(reading from left to right) equals unity. The symbol Ci will denote the 
column in which the element unity occurs. 

( 3 ) Then the arrangement of the rows is such that C\ < c 2 < • • • < c&. 


H = 


0 

1 

^13 

hi4 

^15 

^16 

0 

0 

0 

1 

k>25 

h 26 

0 

0 

0 

0 

1 

^36 

0 

0 

0 

0 

0 

0 


is a typical example (4-29) 

of an echelon matrix. 


We shall furnish constructive proof that any matrix can be converted 
into an echelon matrix by elementary row operations. The term “con¬ 
structive,” as applied here, means that the proof will actually describe 
in detail how this reduction is effected. Starting with matrix A, we move 
to the first column, let us say j, which has at least one element different 
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from zero. If a nonzero element does not occur in the first row, we inter¬ 
change the first row and any other row where a nonzero element appears. 
This element, whose value will be denoted by a, can be converted to unity 
by performing the elementary operation of dividing row 1 by a . If there 
are other nonzero elements in column j, they can be reduced to zero by 
subtracting 0 times the first row from the row where the nonzero element 
occurs (j8 represents the value of the nonzero element). Columns 1 to j 
and row 1 are now taken care of. 

Starting from column j, we move to the next column, h, where at least 
one nonzero element appears below the first row. If the element in row 2 
is zero, we interchange row 2 and any row (with an index >2) having a 
nonzero element. The nonzero element in the new row 2, column h, is 
converted to unity by dividing row 2 by the value of this element. Any 
other nonzero elements in column h (with row index >2) can be con¬ 
verted to zero by subtracting from that row a constant times row 2. 
Continuing in this way, we ultimately obtain an echelon matrix. 

In the actual process of reducing any matrix A to an echelon matrix, it 
is not necessary to find the elementary matrices and carry out the matrix 
multiplication. The elementary operations can be performed directly. 
It is very important, however, to know that the reduction can be carried 
out by premultiplying A by a matrix E which is the product of elementary 
matrices. Furthermore, E is nonsingular. We can write, therefore, 

H = EA, (4-30) 

where H is an echelon matrix. Since E is nonsingular, r(H) = r(A) (see 
Section 4-3). This result is interesting because the rank of H can now be 
read off at a glance. The rank of H is simply the number of nonzero rows 
in H. 

To prove that r( H) is k y the number of nonzero rows in H, observe first 
that r(H) cannot be greater than k. Now it is only necessary to establish 
the linear independence of the k nonzero rows. Let h 1 , . . . . , h* denote 
the nonzero rows of H. We try to determine X t such that 

£\ t h*' = 0. (4-31) 

Row h 1 has an element unity in column Ci, while all other rows h 1 have 
zeros in this column. Thus, Xi = 0. In the same way, X 3 = X 3 = • • • = 

Xfc = 0. Therefore, the first k rows are linearly independent and r(H) = k. 

The reduction of a matrix to an echelon matrix is a fairly efficient nu¬ 
merical procedure for computing the rank of a matrix. It is usually much 
more efficient than the process of finding the largest nonvanishing de¬ 
terminant in A, especially if A is large. However, the task of determining 
the rank of a large matrix is almost always difficult, irrespective of the 
technique applied to the problem. 
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Example: By converting A to an echelon matrix determine the rank of 


A = 


0 0 
0 0 
0 2 
0 3 
0 0 


1 2 

4 6 
3 1 
0 9 

5 7 


8 9 
5 3 
4 7 
3 7 
3 1 


Column 2 is the first one to have a nonzero element. Either row 3 or row 4 
can be interchanged with row 1. Let us interchange rows 1 and 3. Now a 
2 is the first nonzero element in the new row 1. This 2 is reduced to a 1 
by dividing the new row 1 by 2. To convert the 3 (row 4, column 2) to 
a 0, 3 times the final version of row 1 is subtracted from row 4. After 
all these operations are performed, the following matrix is obtained: 


0 1 1.5 0.5 2 3.5 

0 0 4 6 5 3 

0 0 1 2 8 9* 

0 0 -4.5 7.5 -3 -3.5 
_0 0 5 7 3 1 

The third column consists entirely of nonzero elements. We divide row 
2 by 4 and subtract this new row 2 from row 3; then we subtract —4.5 
times row 2 from row 4, and 5 times row 2 from row 5, and arrive at 


0 

1 

1.5 

0.5 

2 

3.5 

0 

0 

1 

1.5 

1.25 

0.75 

0 

0 

0 

0.5 

6.75 

8.25 

0 

0 

0 

14.25 

2.625 

-0.125 

0 

0 

0 

-0.5 

-3.25 

-2.75 


Moving next to the fourth column and third row, we obtain, after divid¬ 
ing row 3 by 0.5 and making the appropriate subtractions from rows 4 and 5, 


0 

1 

1.5 

0.5 

2 

3.5 

0 

0 

1 

1.5 

1.25 

0.75 

0 

0 

0 

1 

13.50 

16.50 

0 

0 

0 

0 

-189.75 

—235.25 

0 

0 

0 

0 

3.50 

5.50 
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Continuing in this manner, we carry out the last two steps and obtain 

0 1 1.5 0.5 2 3.5 

0 0 1 1.5 1.25 0.75 

0 0 0 1 13.50 16.50 • 

0 0 0 0 1 1.238 

0 0 0 0 0 1 

Consequently, the rank of this matrix, and hence of the original matrix, 
is 5. The preceding computations will have given the reader some aware¬ 
ness of the difficulty of determining the rank of a matrix. He may also 
feel that it would have been easier to find the largest nonvanishing de¬ 
terminant. In our example, the amount of work required is about the 
same, whatever our choice of procedure. For larger matrices, however, the 
systematic reduction of a matrix to its echelon form is vastly preferable. 

To determine the rank of a matrix, it is only necessary to reduce it 
to an echelon matrix. However, by applying elementary column opera¬ 
tions to the echelon matrix, we can carry the reduction even further. If 
the leading unity element appears in column j of row 1, then all the ele¬ 
ments of row 1 in the columns following j can be reduced to zero by sub¬ 
tracting from any given column r with index r > j the appropriate multi¬ 
ple of column j . Since the only nonzero element of column j appears in 
row 1, this operation does not affect any nonzero elements in column r 
with row index >1. If the leading unity element of row 2 occurs in column 
q, q > j , the same procedure can be used to reduce all nonzero elements 
of row 2 in the columns following q to zero. Note that, because of the 
transformations on row 1, the only nonzero element of column q occurs 
in row 2. The same reduction is then carried out for the remaining rows. 
The resulting matrix is such that each of the first k rows has only one 
nonzero element, and this element is unity. The remaining m — k rows 
are composed entirely of zeros. Furthermore, no column can contain 
more than a single nonzero entry. Precisely k columns will be unit vectors 
and the remaining n — k columns will be composed entirely of zeros. 
Finally, we can interchange the columns so that the unity element appears 
in row i and column i f i — 1, . . . , k. By a series of elementary column 
operations (characterized by the matrix F) we have reduced the echelon 
matrix H to the unique form 

HF = p* °] • (4-32) 

|o oj 

But H = EA; thus, by a series of elementary row and column operations 
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characterized by the nonsingular matrices E, F, any matrix A can be 
converted to the form 


EAF 


Ifc 0 

t 

0 0 


(4-33) 


if r(A) = k. Depending on the size of A, some or all of the 0 submatrices 
in the right-hand side of (4-33) may not appear. If A is an nth-order non¬ 
singular matrix, EAF = I n . 

Equivalence transformation: A transformation of the type 


B = EAF, 


(4-34) 


where E, F are nonsingular matrices, is called an equivalence transformation 
on A. 

Matrix B is said to be equivalent to A. 
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Problems 

4-1. Show whether the following transformations are'linear: 

(a) y = 2x + - > (b) y = In x, (c) y = e x 

X 

(d) y = sin x, (e) In y = a In a;, (f) y - 3xi + 2x2. 

4-2. Illustrate geometrically the effect of the linear transformation 

fa ll 


A = 


2 5 


on the square whose vertices are [0, 0], [1, 0], [1, 1], [0, 1]. 

4-3. Interpret geometrically the meaning of the linear transformation on E 2 , 

y — 2xi + 3x2. 

Hint: Consider a line normal to the lines which are given by 

y = 2xi + 3x2. 

4-4. Interpret geometrically the transformation produced on E 2 by 

1 0 " 

A = | 0 1 

1 oj 

What is the rank of the transformation? What is the dimension of the range? 
4-5. Find the rank of the following matrices: 


(a) A = (1), 

(b) A = 

1 _ 

. (c) A = 3 

2 - (d) A = 

V 



Li' 2j 

[l 

ij 

0 . 



r T 

3 2 

1 


3 2 1' 

(e) A = 

], (0A- 

4 6 

2 

- (g) A = 

3 2 1 


LU 

.0 0 

0_ 


0 0 0. 

4-6. Find the rank of A, B, 

C where C = AB. 


(a) A = 

<n' 

(M 

1_1 

> B = 

'3 l" 

2 1_ 

f 

(b) A = 

.0* 

i B 

(c) A = 

3 l" 

.2 1. 

, B = 

'4 2 

_3 1. 

i 

(d) A - 

l-1 

O t— 

o ^ 

> B 


■[-::] 
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4-7. Find the rank of the following matrix by reducing it to an echelon matrix: 


2 

3 

A- I , 

-9 

4 

4-8. Given the matrix 


1 

5 

7 

-8 


4 

9 

3 

7 


2 -20 


A = 


0 

1 

2 


-3 -10 
2 4 


10 

3 

1 

11 


-20 -24 


4 5 6 -2 
8 


-2 7 0 
3 1 3 
L 3 2 1 


(a) Find the matrix which, when postmultiplied by A, interchanges rows 1 
and 3, and 2 and 4. Carry out the multiplication to show that the required inter¬ 
change is actually accomplished. 

(b) Find the matrix which adds 6 times the third row to the first, multiplies 
the second row by 10, and then interchanges the second and third rows. Carry 
out the multiplication to show that the required transformations are obtained. 

4-9. Given the matrix of Problem 4-8. 

(a) Find the matrix which, when premultiplied by A, interchanges columns 1 
and 3 and multiplies columns 2 and 4 by 8 . Carry out the multiplication. 

(b) Find the matrix which adds 2 times the third column plus 4 times the 
second column to the first column. Carry out the multiplication to demonstrate 
that the correct result has been obtained. 

4-10. Find the matrices E, F such that I 3 = EAF: 


A = 


4 1 

2 5 

3 2 



4-11. Show that any matrix A with r(A) = r can be written 



What are Ri and R 2 ? 

4-12. If A, B are m X n matrices with the same rank, show that there exist 
nonsingular matrices Ri, R 2 such that 


What are Ri, R 2 ? 


B = R 1 AR 2 . 
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4-13. Which of the matrices representing elementary row transformations 
commute and which do not? 

4-14. Evaluate the determinant of each of the matrices representing ele¬ 
mentary row and column operations. Note that the values of the determinants 
are independent of the order of the matrices. 

4-15. If E, F are elementary matrices and A is a square matrix, prove that 


|EA| = |E| |A|, |AF| = |A| |F|. 


4-16. Prove that the determinant of the product of two or more elementary 
matrices is the product of the determinants. 

4-17. Prove that if A is a nonsingular matrix, it can be written as the product 
of elementary matrices. Hint: see Problem 4-11. 

4-18. Prove that if A, B are nth-order matrices, 


[AB| = |A| [Bj. 


Hint: First, assume that A, B are nonsingular and use the results of Problems 
4-16 and 4-17. What is true if either A, or B, or both happen to be singular? 

4-19. We have seen that, by a sequence of elementary row operations, any 
matrix A could be reduced to an echelon matrix. Show that, by additional 
elementary row transformations, a nonsingular matrix A can be reduced to an 
identity matrix; that is, prove that if A is an nth-order nonsingular matrix, then 
there exists a nonsingular matrix E such that 

In - EA. 

4-20. Using the result of Problem 4-19, find a matrix E such that I 3 = EA 
when 



7 

0 

6 


9 

1 • 
5 


What is E? 

4-21. In Chapter 3, several methods were presented for evaluating de¬ 
terminants. None of the expansions, however, yielded a very efficient numerical 
procedure. The use of elementary transformations provides the key to a reason¬ 
ably efficient technique. Note that, if some row or column of a determinant 
has only one element which differs from zero, we can immediately expand by 
that row or column and reduce the determinant to one whose order is one less 
than that of the original. Let us choose any nonzero element in the determinant, 
for example, an. Divide the first row or column by an. Reduce the aij (j ^ 1) 
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to zero by adding a suitable multiple of column 1. Thus 


an 

ai2 • 

• Oln 


1 

a i2 

ain 


1 

o 

o 




an 

an 




021 

022 • 

* 02n 


021 

022 • • 

a2» 

= an 

a2i 

022 * * * 02n 



: 

— an 

: 


; 

: 

: 

a n i 

O n 2 * 

* o nn 


a„i 

On2 * * ‘ 

Onn 


a n i 

On2 * * * d! nn 





022 

• * * 02„ 









. 

. 


/ 


aiy 




“ an 



j 

dij = 

dij ■ 

- — a<i. 




a «2 

* ’ * O n n 



an 




The same procedure is repeated with 


0,22 • • • 02n 


0>n2 ' * * Onn 


This technique is called pivotal condensation. The element reduced to unity 
at any stage is the pivot. In this case, the element an was the initial pivot. 
It is often advisable to choose as pivot the largest element in absolute value 
Why? Apply the pivotal condensation method to the evaluation of the deter¬ 
minant 

4 5-1 3-2 

8 9 0 7 6 

-4 3 2 8 —7 • 

5 1 1—6 2 

3 10 1-1 9 

4-22. If we wish to reduce A to an echelon matrix and, at the same time, 
find matrix E which carries out the reduction, reduce columns 1 through n of 
matrix (A, I) to an echelon matrix. After reduction, E is found in the columns 
originally occupied by I since 

E(A, I) = (H, E). 

Reduce 

"l 2 3" 

4 5 6 
7 8 9 

to an echelon matrix and, at the same time, find the matrix E which, when 
postmultiplied by A, yields the echelon matrix. 
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4-23. The results of Problem 4-22 can be used to develop a numerical pro¬ 
cedure for finding the inverse of a nonsingular matrix. Problem 4-19 shows that 
there exists a nonsingular matrix E such that I = EA, and hence E = A” 1 . 
Thus, if we form (A, I) and reduce the first n columns to an identity matrix by 
elementary row operations, the last n columns will contain A” 1 . Using this 
method, compute the inverse of 


A = 


4 1 
2 6 
3 2 



4-24. Express the vector b = [2, 7] in terms of the following bases: 
(a) ai = [3, 4], a 2 = [2, 5], (b) ai = [1,1], a 2 - [0, 1], 

(c) ai = [7, 2 ], a 2 = [3, 1], (d) ai = [2, 1], a 2 = [3, 5]. 


4-25. Prove that every linear transformation of E n into a subspace of E m is 
equivalent to, and can be represented by, a matrix transformation. Hint: The 
linear transformation T is completely characterized by the way it transfers the 
unit vectors, since 

T(x) = £ z,T(e y ). 

3=1 

However, T(ej) is an element of E m and can be written 


T(C j) — ^ 1 dijGi) 

*=1 

where the c* are the unit vectors for E m . Does the matrix A = ||a*,|| characterize 
the linear transformation? Show that if y = T(x), then y = Ax. Using the 
preceding results, prove that, for fixed bases in E n , E m , the matrix which char¬ 
acterizes the linear transformation is unique. 

4-26. Consider the linear transformation of E 3 into E 3 described by 


T(e\) — 8 €i + 6 c 2 + € 3 , 7’(e 2 ) = 2c i + 5 * 3 , 7 X^ 3 ) = ci + 2c 2 , 


where the c» are also unit vectors. What is the matrix A which represents this 
transformation in such a way that y = Ax? If, instead of using the unit vectors 
ei, e 2 , e 3 , we use the vectors Vi = [ 6 , 1, 1], v 2 = [3, 7, 5], V 3 = [0, 1, 6 ] for a 
basis, what is the matrix which characterizes the linear transformation? 

4-27. A linear transformation takes the vector ai = [3, 4] into bi = [—5, 6 ] 
and the vector a 2 = [— 1 , 2 ] into b 2 = [4, 1 ]. What matrix represents this 
linear transformation when the unit vectors are used as a basis? 

4-28. A linear transformation T takes [1, 1 ] into [0, 1 , 2] and [—1, 1] into 
[ 2 , 1 , 0]. What matrix A represents T relative to the basis [ 1 , 1 ], [— 1 , 1 ] in E 2 
and ci, c 2 , € 3 , the unit vectors, in E 3 ? 
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4-29. Let Ti be a linear transformation which takes E n into a subspace of 
E r and let B characterize this transformation for fixed bases in E n and E r . 
Furthermore, assume that T 2 is a linear transformation which takes E T into a 
subspace of E m and that A characterizes this transformation for fixed bases in 
E r and E m . Under what conditions does AB characterize the product T 2 T 1 ? 
Carry out the details to show that if appropriate assumptions are made, AB 
does represent T 2 T\. 

4-30. Consider the elementary operation of multiplying a row or column of a 
matrix by a scalar X; why do we require that X 5 * 0? 

4-31. Show that if columns i and j are interchanged in a nonsingular matrix B, 
the inverse of the new matrix can be found by interchanging rows i and./ in B _1 . 
Hint: The new matrix can be written BF. 

4-32. Let A be a matrix obtained from a nonsingular matrix B by a given 
series of elementary operations. Discuss how A -1 can be found from B -1 . 
Consider individually each type of elementary operation and the sequence in 
which they are performed. 

4-33. A set of basis vectors bi, . . ., b„ for E n is said to be triangular if, for 
bj = [b\j, .. ., b n j], bij = 0 (t > j). This means that the matrix B = (bi ..., b n ) 
is triangular (see Problem 3-32). Show how a triangular basis may be obtained 
for E n from any set of basis vectors. 

4-34. Given a triangular basis for E n , bi, ..., b»; show that if any other 
vector x is expressed as a linear combination of the bj, x = £Xjbj, the Xj 
can be computed sequentially and the inverse of B = (bi, ..., b«) need not 
be found. Hint: Only b n has an nth component different from zero. Thus 
Xn = Xn/bnn . What is \ n -i, etc? 

4-35. An nth-order matrix A is called decomposable if by interchanging some 
rows and the corresponding columns it is possible to obtain a null matrix in the 
lower left-hand corner so that A can be written (An, A22 square) 



A12 

A22. 


By analogy, A is called indecomposable if it is not possible to obtain the required 
single zero element in the lower left position. Note that if rows i, j are inter¬ 
changed, then columns i y j are interchanged also. It may turn out that An, 
A22 in the foregoing expression are also decomposable. Any decomposable matrix 
can then be reduced to the triangular form 


Ai B12 • * * Bi* 

0 a 2 • * • b 2 * 


0 0 • • • A* 


where Ax, ..., A* are indecomposable. Show that the following matrix is de- 
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composable and write it in triangular form: 

~ioooo" 

4 4 6 8 1 

9 0 1 2 0 - 

3 0 0 4 0 

1 3 5 7 0_ 

4-36. Prove that the square decomposable matrix A can be decomposed by 
the equivalence transformation EAF with F = E', E being the product of the 
elementary matrices E tJ . 

4-37. Show that reducing a decomposable matrix to the triangular form dis¬ 
cussed in Problem 4-35 does not change the value of the determinant of A. 

4-38. Is the sequence of elementary row and column operations, i.e., the 
matrices E, F which reduce A to the form of Eq. (4-33), unique? Can you pro¬ 
vide an example where they are not unique? 

4-39. Devise a matrix B such that if the same row and column operations 
which reduce A to the form of Eq. (4-33) are performed on B, then E and F 
can be found in B also. 

4-40. Express the vector b = [2, 7] in terms of the following triangular bases: 

(a) ai = [3,4], a 2 = [2,0]; (b) ai = [1,1], a 2 = [0,1]; 

(c) ai = [7, 2], a 2 = [3, 0]; (d) ai = [2, 1], a 2 = [3, 0]. 

4-41. Consider the matrices 






2 

<f 

A - 

'3 1 

*i, 

B = 

-3 

2 


2 1 

■j 


—1 

—1 


Show that AB = I 2 . Why do we not want to call B the inverse of A? Interpret 
this geometrically. 

4-42. Consider the process of reducing a matrix A to row echelon form. If 
r(A) = k, this can be considered to be a A>stage process. At stage $, 
s = 1, ,. ., h, the first nonzero element in row $ is converted to unity. If this 
element lies in column r, all elements in column r with row index > s are reduced 
to zero. Denote the elements in the matrix at the beginning of stage s by Uij. 
Show that the elements Uij of the matrix at the end of stage $ are given by 

&ij = 'U'ijj t ^ SJ 'fl'aj ~ * J all j ] Hij = Uij ~ Ui r , t ^ 5, all J. 

U 8r U 9T 

The matrix at the beginning of stage $ will be the same as the matrix at the end 
of stage s — 1 unless it is necessary to interchange rows so that the first non¬ 
zero element of row $ will have as low a column index as possible. 
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4-43. It is desirable to have some sort of automatic check on the numerical 
work necessary in reducing a matrix to row echelon form or in finding the in¬ 
verse of a matrix by means of the technique discussed in Problem 4-23. A very 
simple check, called a sum check, can be used. This check requires only a slight 
amount of additional effort. Suppose that we wish to reduce A to row echelon 
form. Let U be the sum of the elements in the fth row of A, that is, U = 
or if t = [t\, . . . , t m ], then t = a ; . Consider the matrix B = (A, t) which 

has the same number of rows as A, but one additional column t. We then reduce 
B to row echelon form. The check is as follows: At each stage, the element 
in column n + 1 of row s should be the sum of the elements in the first n columns 
of row s. Thus after each stage, we sum the first n elements in row s; the result 
should be the number which appears as element n + 1 of row s. This is done 
for every row s. Prove that this is true. Hint: See Problem 4-42. 

4-44. By means of the sum check show that the numerical computations in 
the example of Section 4-6 are correct. 

4-45. Discuss in detail the way the sum check is used in inverting a matrix 
by the technique of Problem 4-23. Illustrate this by adding a sum check column 
when computing the inverse of A in that problem. 


Problems Involving Complex Numbers 


4-46. List the important results of this chapter and show that they all hold 
if the elements of the matrices are complex numbers. Demonstrate that even 
geometrical interpretations can be given a meaning in F n (c), that is, in the 
n-dimensional vector space of all w-tuples with complex components. 

4-47. Find the rank of the following matrix A by reducing it to row echelon 
form: 


A = 


2 +i 

3 — 4t 

5 

6 + 2t 

7 

3 - i 

2 i 

9 - 3i 

—6i 

8 

7 + 6i 

1 

00 
<s>. 

5 - 3i 

2 i 

6 + i 

1 — i 


4-48. Invert the following matrix A, using the technique suggested in 
Problem 4-23: 



4+ St 

2 

6 — i 

A = 

-5 4- 

i 9 — 2 i 

3 


4t 

-6 

8 4~ f 

Find nonsinguiar matrices E 

, F such that B = 

A =r *■ 

5 

> B = 

3 

6 - 7t 

2 + 3i 


1 — i 


6 i 
4 + 


■l- 

2tJ 
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4-50. Find the matrix E which, when postmultiplied by A, interchanges rows 
1, 3, multiplies row 2 by 7 — i, and subtracts —2 + 3 i times row 4 from row 5. 
Show by actual multiplication that E does indeed perform these operations. 

4 — i 2 

3 i —4 — 2i 

A = 3 + 2i —5t 

5 — 6i 7 + 21 
7i 8 — i 

4-51. Let A, B, C be matrices with real or complex elements such that C = 
AB. Assume that A is m X r and B is r X n. If r(C) = m, show that r(A) — m. 
What restriction does this place on r? WTiat can be said about the rank of B? 
What restriction does this place on n? If m = n, show that r(A) = r(B) = m. 
What restriction is thereby placed on r? 




CHAPTER 5 


SIMULTANEOUS LINEAR EQUATIONS 

“Wavering between the profit and the loss 
In this brief transit where the dreams cross” 

T. S. Eliot. 

5-1 Introduction. Sets of simultaneous linear equations appear in most 
linear models. Frequently, the number of equations will be equal to the 
number of variables (as in the Leontief economic and the statistical re¬ 
gression models of Chapter 1); in such cases, as is to be expected, we are 
usually able to solve for unique values of the variables. If there are more 
variables than equations (the constraints of linear programming problems 
may provide such an example), we expect, in general, to obtain an infinite 
number of solutions. Sometimes, we have more equations than variables. 
Thus for the d-c circuit (Chapter 1), it is possible to write down many 
more equations than there are variables. However, not all of these equa¬ 
tions are independent since some of them can be obtained from the others. 
Under such circumstances, it is desirable to find enough independent equa¬ 
tions to be able to solve for all the variables. 

This chapter deals with the theory of simultaneous linear equations. 
We shall be concerned with deriving criteria for the existence and unique¬ 
ness of solutions and with the properties of solutions. We shall begin by 
discussing a fairly efficient numerical technique for solving simultaneous 
equations. 


5-2 Gaussian elimination. In the real world, it is normally expected 
that n equations (linear or not) relating n variables can be solved to yield 
a set of numerical values for the n variables. Frequently, physical intui¬ 
tion leads us to assume also that the solution will be unique. Let us sup¬ 
pose that we have n linear equations relating n variables. They can be 
written : 


GllZl + • * * + O'lnXn = b i, 

1%1 “f" * * ’ T* O'nn^'n == b n , 


(5-1) 


or 

Ax = b (5-2) 

when Eq. (5-1) is written in matrix form. We wish to solve this system 
of equations, that is, find the values x u ... r x n which will satisfy the 
equations. 
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A procedure which immediately suggests itself is that of successive 
elimination of the variables. Without any loss in generality, we can 
assume flu ^ 0 since the equations can be rearranged and the variables 
renamed to conform with our premise. Then we solve explicitly for x x 
and obtain 


„ °12 „ 
Xi = — — x 2 
an 


a i» r i . 

an an 


(5-3) 


Dividing the first equation by an to reduce the coefficient of x% to unity, 
and using Eq. (5-3) to eliminate x x in the remaining n — 1 equations, 
we have 


„ 1 a 12 ^ . 
Xi -+* -— X 2 + 

• • • + 7T x » = ’ 


a n 

an an 

'?*Y| 

| x 2 + • • * + J 

la 2 „ - a 21 (^)l 

1 r &i 

X n — 0 2 — a 2 1 -— y 

<a n/\ 


L Van/J 

i an 

'?*Y| 

| X 2 + * • • + 

kn - a nl (^Y 

1 r bi 

I x n — o n a n i y 

l/J 


L Van/- 

1 an 




(5-4) 


or 


xi + a'i 2 x 2 H-+ a'lnXn — V i, 


a 22 x 2 + • • • + a 2n x n — b 2 , (5—5) 


^n2 x 2 “h * * * + ®nn*^n — b n . 

If at least one of the a'ij (i , j = 2, . . . , n) differs from zero, we can 
assume, without loss of generality, that a f 22 ^ 0. The reduction process 
is continued by dividing the second equation of (5-5) by a 22 and by using 
this equation to eliminate x 2 in equations 3, . . . , n. Then, as expected, 
xz is eliminated from equations 4, . . . , n until, finally, we obtain the 
system 

x i + hi 2 x 2 + •••■ + hi n x n — g i, 


x 2 + • • * + h 2n x n = g 2 , 


Xn —1 + ln x n — 0n—1; 


(5-6) 


x n — g n . 


We obtain immediately x n = g n . This value of x n is then substituted into 
the preceding n — 1 equations. Thus 

X n — 1 = Qn~l hn — \g n > 
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This result is also substituted into the remaining n — 2 equations, etc. 
By continuing this process of back substitution, we obtain all the x t - 
values. This procedure is called gaussian reduction or gaussian elimination. 

We can now make an interesting observation; Since the final set of 
equations can be written in the form of 


or as 


1 h\2 ■ • * h\ n 


x x 


01 

0 1 • • • h 2n 


*2 

_ 

92 

0 0 • • • 1 


_x n _ 


_ 9n_ 


Hx — g, 


(5-7) 


(5-8) 


it becomes clear that H represents the echelon matrix that would be ob¬ 
tained from A, following the rules of Section 4-6.* We only have to recall 
the result of the operations performed on the elements of A in order to see 
that the gaussian reduction scheme does indeed convert A into an echelon 
matrix H. If matrix E represents the combination of elementary row 
operations which takes A into H, then 


H = EA, g - Eb. (5-9) 

This observation supplies us with some useful information. We know 
that (5-1) will reduce to the form of (5-6) if and only if the rank of A 
is n. If rank A is less than n, the method fails for one of two reasons: 
Either we shall not be able to determine the values of all the variables or 
there will be an apparent inconsistency evidenced by the fact that all 
hij vanish in some row, while gi does not vanish. At present, we are not 
sure what this failure implies. This will become clear later. However, we 
know that the method will work if r(A) = n. 

Instead of eliminating x k only in equations k + 1, . . . , n, we could 
equally well eliminate x k in equations 1, . . . , k — 1 also, so that x k 
would appear only in the kth equation. Now, back substitution is not 
needed. This modification of gaussian elimination is called the Gauss- 
Jordan method. Both reduction schemes are iterative procedures, and 
we would think of them first in attempting to solve a set of linear equa¬ 
tions. Interestingly enough, they are fairly efficient numerical procedures, 
and modifications of them are among the methods used for solving systems 
of linear equations either by hand or on high-speed computers. The 


* At this point, it should be obvious that the definition of elementary opera¬ 
tions on matrices follows logically from the manipulations of simultaneous 
linear equations. 
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following example will illustrate both the Gauss and the Gauss-Jordan 
methods in a simple case. 

Example: Solve the following set of linear equations: 

2xi + x 2 + 4x 3 — 16, 

SX \ + 2x 2 + X 3 = 10, 

x x + 3x 2 + 3x 3 = 16. 


(a) Gauss reduction: We use the first equation to solve for Xi and sub¬ 
stitute this into the second and third equations. This yields 

Xi + \x 2 + 2x 3 = 8, 

\x 2 — 5x 3 — —14, 
fx 2 + x 3 = 8. 

Using the second equation from this new set, we eliminate X 2 in the third 
equation. The first equation remains unchanged. Thus 

Xi + \x 2 + 2x 3 = 8, 

X 2 — 10x 3 = 28, 

26x 3 = 78. 


From the third equation x 3 = 3. Substituting this into the first two 
equations, we find 

X 1 + 2 X 2 = 2 , 


Hence 


x 2 = 2. 

X\ = 1, x 2 = 2, x 3 — 3. 


(b) Gauss-Jordan reduction: The first step is the same as under (a): 

Xi + \x 2 + 2x 3 — 8, 

ix 2 — 5 x 3 = —14, 

\x 2 + x 3 = 8. 

Using the second equation, we solve for x 2 . The result is now substituted 
into both the first and third equations. This gives 


x \ + 7x 3 — 22, 
x 2 — 10x 3 28, 

26x 3 = 78. 

On obtaining x 3 = 3, we immediately find X\ = 1, x 2 = 2. 
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5-3 Cramer’s rule. Let us again consider the system of n simultaneous 
linear equations in n unknowns (5-1). Repeating the matrix form, we have 


Ax = b, (5-10) 

where 

A ||a t7 ||, x = \x\ y . . . , x n ] f b = [&i,. . ., b n ]. 

Now assume r( A) — n so that A is nonsingular. This means that |A| 0 

and A -1 exists. Premultiplying (5-10) by A*" 1 , we arrive at 

A _1 Ax = lx = x = A _1 b. (5-11) 

Hence, when A is nonsingular, we obtain a unique solution x = A -1 b 
to the set of equations (5—1). The solution is unique since the inverse is 
unique. Thus, in Eq. (5-11), we have arrived at an explicit solution to the 
set of equations through the use of the inverse matrix. We are, however, 
no closer to a numerical solution than we were at the outset unless we 
happen to know A” 1 . Nevertheless, because of its explicit character the 
solution of (5-11) is of great use in theoretical work. The value of the 
originally unknown vector is expressed as x = A -1 b, and A“*b can be 
computed from the known quantities A, b. 

Equation (5-11) can be cast into a more interesting form. We recall 
from Section 3-18 that the inverse is given by 


A" 1 = |A|— 1 A + ; 


A + = 



A in 


(5-12) 

(5-13) 


where Aij is the cofactor of element o»y in A. Thus we can write Eq. (5-11) 
in component form as 

x i ~ |A| yi Ajibj = |A| 1 bjAji. (5—14) 

j=i j—i 

However, we remember that 

n 

^ ] QjiAji 
3=1 

is the expansion of |A| by column i. On comparison, we see that 

bjAji 

3 = 1 

is the expansion of the determinant formed from A by removing the ith 
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column and replacing it with column b. This yields what is called Cramer's 
rule: To obtain the value of x i} we divide by |A| the determinant of the matrix 
formed from A by replacing the ith column with b. Hence 


b 1 0*12 * 

®ln 


a ii * 

• am-i b i 

& 2 &22 * 

* 0>2n 

£ 

II 

>| 

a 2 i * 

* 0'2n—l & 2 

bn &n2 

' * ®nn 

a n i * 

' * ®nn-1 


(5-15) 


This, of course, is the method learned in elementary algebra for solving 
equations by determinants. Cramer's rule is singularly inefficient for 
solving a set of equations numerically. The evaluation of the n + 1 
determinants involves too much work, especially, when n is fairly large. 
The gaussian reduction method is considerably more efficient. However, 
Cramer's rule is very useful in theoretical studies, as is x = A - *b, because 
it allows an explicit expression for the solution. 


Example: Solve 

3xi + 2x 2 = 7, 
4xi + x 2 = l; 


|A| 


3 2 

4 1 


= —5 t* 0. 


Thus a unique solution exists and is given by 


Xi 


x 2 


7 2 
1 1 

3 7 

4 1 


- | (7 - 2 ) = - 1 , 

- J (3 - 28) = 5. 

5 


This solution may be easily verified by substituting the above values into 
the original equations. 


5-4 Rules of rank. In the preceding two sections, we have examined 
wa^te of solving a set of n simultaneous linear equations in n unknowns. 
We now wish to investigate the conditions which determine whether solu¬ 
tions do or do not exist. We have seen that a set of n equations in n 
unknowns has a unique solution if r(A) — n. We have also noted that 
difficulties arose if r(A) < n (although this case was not considered in 
any detail). To be completely general, let us consider a set of m simul- 
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taneous linear equations in n unknowns. No restriction will be made as 
to whether m > n, m — n, m < n. This set can be written 


«ll^l + • • • + CllnXn = bi, 
021^1 + ••• + d 2n X n = b 2j 


(5-16) 


+ * * * + — b mj 

or 

Ax = b, 

where A is an m X n matrix. 

Next we shall define a new m X (n + 1 ) matrix A*> which contains A 
in the first n columns and b in column n + 1 , that is, 


A& = (A, b) = 


<*11 * * * Gin b 1 
« 2 i • • * a 2n b 2 

,<*ml * * * <*mn b m 


(5-17) 


Matrix A b is a very important quantity; its rank as well as the rank of 
A determine whether the set of equations (5-16) has a solution. A b is 
called the augmented matrix of the system. 

Since every determinant in A also occurs in A*,, the rank of A cannot 
exceed that of A 5 . Hence two possibilities exist: (a) r(A) < r(A&), 
(b) r(A) = r( A&). It should be noted that r(A&) cannot be greater than 
r(A) + 1 . The following paragraph will demonstrate that the cases 
(a) and (b) play a crucial role in determining whether Eq. (5-16) has a 
solution. 

If (a) holds, that is, if r(A) < r(A&), then there do not exist any xj 
satisfying (5-16). The largest non vanishing determinant in A b must 
contain the column b, since r(A) < r(A&). Hence b is linearly independent 
of the columns of A, and thus there are no xy such that 

n* 

'y 1 — h, 

i= 1 

where the ay represent the columns of A; that is, there do not exist any Xy 
satisfying (5-16). Hence, there is no solution, and the equations are 
inconsistent . 

However, if (b) holds, that is, r(A) = r(A b ) = k , then there is always 
at least one solution. Since r(A) = k and r(A*>) = k, every column of 
Aft can be expressed as a linear combination of k linearly independent 
columns of A. (Without loss of generality, we can assume that they are 
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the first k columns of A.) Since b is a column of A b , there must exist 
numbers Xj such that 

k 

^2 x &i = b; 

i=l 

hence, there is at least one solution to the system of equations (5-16). 
Thus, we have proved that: If r(A) < r(A*>), the equations (5-16) are in¬ 
consistent and there is no solution. Conversely , if r(A) = r(A&), there is al¬ 
ways at least one solution to the set (5-16). 

It is important to note that the existence of a solution does not depend 
on r(A) = minimum (m, n). Conceivably, we could have 100 equations 
and 1000 variables with r(A) = r(A b ) = 1; from the preceding discussion 
we know that there would be at least one solution. 


Examples: 


(1) Is there a solution to 


A = 


3 

2.25 


4 

3 


3x x + 4x2 = 7, 
2.25.ri + 3x 2 = 5.25? 


r( A) = 1, A b 


3 4 

2.25 3 


7 

5.25 


r(A 6 ) = 1. 


Thus, the set of equations has a solution. In fact, there are an infinite 
number of solutions. These are given by 



for any x 2} since the second equation is just f times the first. Note that 
geometrically the two equations represent the same straight line. 

(2) Does a solution exist for the set 


3x\ 4^2 — 7, 


2.25xi + 3x 2 = 1? 


r(A) = 1, A b 


3 4 7 

2.25 3 1 


r(A b ) = 2. 


No solution exists. If the first equation is multiplied by f, the left-hand 
side becomes the left-hand side of the second equation. However, the 
right-hand side of the first equation does not become the right-hand side 
of the second equation. The equations are clearly inconsistent. Illustrate 
this graphically (two parallel lines which do not intersect). 
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(3) Is there a solution to 


Sxi + 2x 2 + x 3 = 7, 
xi + 0.5x 2 — x 3 = 4, 
X\ -|- 0.75x 2 xq === 5? 


A = 

'3 2 

1 0.5 

l' 

— 1 

1 A& = 

“3 2 

1 0.5 

1 7“ 

-1 4 


.1 0.75 

1 _ 


„1 0.75 

1 


|A| = 0, r( A) = 2. 


However, r(A 5 ) = 3 . There is no solution. If the second equation is 
added to twice the third, we obtain 


3x x + 2 x 2 + x 3 = 14; 

this is inconsistent with the first equation. What is the geometrical 
interpretation? 


5-5 Further properties. We have noted in one of the preceding ex¬ 
amples that if a system of linear equations has a solution, this solution 
need not be unique. If the system Ax — b has two distinct solutions x x 
and x 2 , then Xx x + (1 — X)x 2 is also a solution for any number X. To 
prove this, assume that 

Axi = b, Ax 2 = b. (5-18) 

Then 

XAx x = A(Xx x ) = Xb, 

(5-19) 

(1 — X)Ax 2 = A(1 — X)x 2 = (1 — X)b. 

Adding the two equations, we obtain 

A[Xx x + (1 - X)x 2 ] = Xb + (1 - X)b = b. (5-20) 

Hence, Xx x + (1 — X)x 2 is a solution if x x , x 2 are solutions. This result 
illustrates immediately that if a system Ax — b has two distinct solutions , 
then there exists an infinite number of solutions. This follows since in (5-20) 
X can take on any value. 

Let us suppose that we are given the system Ax — b, with A an m X n 
matrix and 

r(A) = r(A&) — k < m. (5-21) 

The rows of A& will be denoted by (a 1 , bf). If we choose k rows of A which 
are linearly independent, then the same k rows of A& are also linearly in¬ 
dependent. Let us assume that these are the first k rows of A 6 . Then, 
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according to Eq. (5-21), any other row in A& is a linear combination of 
the first k rows, that is, 

k 

(a r , br) = E Ma\ bi), r=k + 1,..., m; (5-22) 

t = l 

or, separating out b ri we obtain 

a r = E x ***» (5-23) 

1=1 

b r = E X *> 6 ‘- (5-24) 

1=1 

If x satisfies the first k equations of Ax = b, that is, a l x = 6 t -, i = 1,. . ., k, 
then from Eqs. (5-23) and (5-24), 

k k 

a r x — E X >va*x = ^ = b r , r = k + 1,... ,m. (5-25) 

1=1 1=1 

Thus: Any x which satisfies k equations a*x = bi for which the correspond¬ 
ing rows a* in A are linearly independent satisfies all m equations. In 
other words, all but k equations can be ignored when seeking the solutions 
to Ax = b. 

Let us imagine that k equations have been selected for which the corre¬ 
sponding rows of A are linearly independent. This assumption implies that 
there must be at least k variables, that is, n > k. If n = k, then the 
matrix of the coefficients of this set of k equations must be nonsingular, 
and, according to Cramer’s rule, there is a unique solution. If n > k , the 
k equations can be written 

A lXa + Rxp = b*, (5-26) 

where Ai is a k X k nonsingular matrix, and R is a k X (n — k) matrix. 
Furthermore, x* = [xi X/j = [x* + i, . . . , x n ], and b* contains 
the k components of b corresponding to the k equations selected. We 
have named the variables so that the first k variables have associated with 
them a nonsingular matrix. Then 

x„ = AF'b* — Ar'Rx*. (5-27) 

Hence, given any x$, we can solve uniquely for x a in terms of x^. Therefore, 
arbitrary values can be assigned to the n — k variables in x^; values for 
the remaining k variables in x can be found by Eq. (5-27), so that x = 
[x«, Xp] is a solution to Ax = b. All solutions to the set of equations 
can be generated by assigning all possible values to the set of variables 
in X£. 
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Summing up the discussion of the preceding paragraph, we see that , in 
a system of m simultaneous linear equations in n unknowns , if r(A) = 
r(Az>) = k < m, then any x which satisfies k of the equations for which the 
corresponding rows of A are linearly independent satisfies all equations of 
the set. Furthermore , if k < n, n — k of the variables can be assigned arbi¬ 
trary values , and the remaining k variables can be solved for provided the 
columns of A associated with the k variables are linearly independent . An 
explicit form of the general solution is given by Eq. (5-27). The reader 
should now be able to see why the Gauss reduction procedure discussed 
in Section 5-2 might not lead to a unique solution of a set of n equations 
in n unknowns. If r(A) < r(A&), there is no solution; then a nonzero gi 
appears in the reduction, while all hij in that row vanish. When r(A) = 
r(A&) — k < n, arbitrary values can be assigned to n — k of the vari¬ 
ables. This can be clearly seen in the Gauss reduction since one or more 
rows of h^ will be composed entirely of zeros, and the corresponding 
gi — 0. Hence, back substitution will not eliminate all variables. It is 
important to note that both the Gauss and the Gauss-Jordan methods 
are equally useful in solving numerically sets of m simultaneous linear 
equations in n unknowns. Problem 5-24 illustrates this point. 

The preceding discussion shows: (1) Whenever r(A) = r(A*>) < n (the 
number of variables), an infinite number of solutions will satisfy the 
equations since in this case some variables, with arbitrary values assigned 
to them, can always be transferred to the right-hand side. (2) There 
will be a unique solution if and only if r(A) = r(A&) = n. Thus for a 
system of simultaneous linear equations, there is either a unique solution, 
an infinite number of solutions, or no solution at all. 

If r(A) — r(A&) = k < m, m — k of the equations are linear combina¬ 
tions of the remaining k equations. These m — k equations are called 
redundant since they do not place any additional constraints on the 
variables; they could be dropped from the set without any effect on the 
solutions. When formulating a system of equations, we try to avoid re¬ 
dundant equations. However, a large system involving many variables 
and equations may make it extremely difficult to determine whether any 
new equation is linearly independent of the others. 

Example: Find a solution to the system 

2x x + 7x 3 = 4, 

3#i + Sx 2 -|- 6#3 = 3, 

2xi -j- 2x 2 “l - 4^3 — 2. 

First, let us check whether r(A) = r(Aj>) to make sure that a solution does 
exist. We note that r(A) = r(Ab) = 2. Thus there is a solution. However, 
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since r(A) 3, the solution will not be unique; there exists an infinite 
number of solutions to the system. The determinant of order 2 in the upper 
left-hand corner of A does not vanish. Hence, we shall use the first two 
equations to solve for x x and x 2 by Cramers rule, setting z 3 to any arbi¬ 
trary value. Let us suppose that x 3 = 2. Then we must solve 


hence 




2 0‘ 

Xi 


1 

u 


'-10' 

3 3. 

J*2. 


.3 - 6*3. 


. -9 


1 

6 


-10 0 
-9 3 


5 , 



-10 

-9 


= 2 . 


Thus one solution to the system has the values: x x = —5, x 2 = 2, 
x 2 = 2. We can check this result by substituting these values into all 
three equations. We see that: 

Equation 1: —10 + 14 = 4. 

Equation 2: —15 + 6 + 12 = 3. 

Equation 3: —10 + 4 + 8 = 2. 

The second or third equation can be considered redundant since the 
second equation is 3/2 of the third. 


5-6 Homogeneous linear equations. We shall now examine the special 
case of b = 0, that is, the right-hand side of Eq. (5-16) vanishes. A sys¬ 
tem of linear equations of this type is called homogeneous. It can be 
written: 

anXi +-b drnXn = 0, 


+ * * * + d mn X n — 0, 


(5-28) 


or, in matrix form, 


Ax = 0. 


(5-29) 


We see immediately that a set of homogeneous linear equations always has 
a solution since, with b = 0, it must be true that 

r(A) = r(A 6 ). 

We note also that x = 0 is always a solution (called a trivial solution). 
It is of interest to determine when solutions other than x = 0 exist. 
From Section 5-5, we know that if r(A) = k < n } arbitrary values can 
be assigned ton — A; of the variables, and hence a nontrivial solution al¬ 
ways exists. Thus we can prove the following important theorem: A 
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necessary and sufficient condition for a system of homogeneous linear equa¬ 
tions Ax = 0 in n variables to have a solution other than x = 0 is that 
r( A) < n. If r(A) < n, then it follows from the above argument that 
there is a solution which is not trivial. If r(A) = n, then from Eq. (5-27) 
x a = x, and 

x = Af'O = 0. 

Thus there is only one solution and it is trivial. This result has two use¬ 
ful corollaries: (1) If there are fewer equations than unknowns , the system 
(5-28) always has a nontrivial solution . (2) If the number of equations is 
equal to the number of unknowns , a necessary and sufficient condition for a 
nontrivial solution is that the determinant of the coefficients vanish , that is, 
|A| = 0. We are familiar with this fact from elementary algebra. If we 
have a set of n homogeneous linear equations in n unknowns, there is no 
solution other than x = 0 if |A| ^ 0. This result can, of course, also be 
obtained directly from (5-11), for 

x = A -1 0 — 0. 

However, if |A| = 0, then there exists a solution different from x = 0. 
Let us consider any solution x ^ 0 to Ax = 0. Since for any scalar X, 

XAx = AXx = XO = 0, (5-30) 

it follows that if x is a solution to the set of equations, so is Xx. Hence, 
the appearance of one nontrivial solution automatically implies the ex¬ 
istence of an infinite number of nontrivial solutions. If n — 2 and x 
is a nontrivial solution, then, geometrically speaking, the fact that Xx 
is a solution means that any point on the line through x and the origin is 
also a solution. 

The arguments in the preceding paragraph can be pursued further. 
In general, we are considering a homogeneous system of m equations in n 
unknowns. The vector x satisfying Ax = 0 is a point in E n . We have 
already shown that x = 0 is a solution to Ax = 0, and that if x is a solu¬ 
tion, so is Xx. Furthermore, if Xi and x 2 are distinct solutions, then 
x 3 = *i + x 2 is also a solution. This follows since 

Axi = 0, Ax 2 = 0; 

adding the two expressions, we obtain 


Axi + Ax 2 = A(x x + x 2 ) = Ax 3 = 0. 

Thus we have proved (see definition of a subspace) that the set of all solu¬ 
tions to Ax = 0 forms a subspace of E n . We shall now show that q, 
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the dimension of this subspace, is n — k where n, as usual, denotes the 
number of columns in A and k is the rank of A. To prove this, let x lf . . ., x q 
be q vectors which span the ^-dimensional subspace. According to Sec¬ 
tion 2-10, this set of vectors can be extended to form a basis for E n . 
Let yi, . . . , jn-q be the n — q additional vectors which, along with 
Xi, . . . , x 5 , form a basis for E n . Then any vector z in E n can be written 
as a linear combination of the basis vectors, that is, 

n—Q Q 

z = 22<riJi+ 22 

t=i i=i 

However, 

Az = ^ (Tikji = ^2 (5-31) 

i = l j= 1 


since by definition 


Axj = 0 . 


Equation (5-31) indicates that the n — q vectors, Ay*, span the sub¬ 
space of E m which is generated by the columns of A (see Section 4-2). 
Let us show that the vectors Ay * are linearly independent. We shall 
assume that there exist X* not all zero such that 

LXjAy; = 0 = AEX*y*] = 0. 


This expression implies that £X,y* is an element of the subspace spanned 
by Xi, . .., x g , that is, 


- H 

Y = 


*=i 


Y y i X >< 


or 


Y x *y«' _ 


i= 1 


7 : **3 X 3 ~~ Q’ 

y=i 


This result contradicts the original assumption that the y» and x t - 
form a basis for E n . Hence, the vectors Ay* are linearly independent 
and form a basis for the subspace of E m generated by the columns of A. 
Since there are n — q vectors, the rank of A is n — q = k. Thus we have 
proved that 

q = n — k. (5-32) 


Consequently, the dimension of the subspace of E n generated by the 
solutions to Ax = 0 is n — k. We could have guessed this intuitively 
since arbitrary values can be assigned to the n — k of the variables. 

We shall now examine equation Ax = 0 in a slightly different way. Let 
us note that A maps E n into a subspace of ET. The set of vectors x 
in E n which are taken into the origin of E ym is the set of solutions to 
Ax = 0; we have shown that this set of vectors is a subspace of E n . 
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This collection of vectors is sometimes called the null space of A. The dimen¬ 
sion of this sub space of E n which depends only on A is called the nullity of A. 
We have proved that the nullity of A plus the rank of A is equal to the num¬ 
ber of columns in A, that is } q + k = n. The study of homogeneous linear 
equations reveals thus another property of matrices and linear transforma¬ 
tions. 

Let us again view Ax == 0 as a set of homogeneous equations. A set 
of basis vectors spanning the subspace of E n generated by the solutions to 
Ax = 0 is called a fundamental system for Ax = 0. Since the dimension 
of the subspace is n — k, we have n — k vectors in a fundamental sys¬ 
tem; and since the number of bases is infinite, an infinite number of 
different fundamental systems can be generated. 

Example: Illustrate geometrically the subspace formed by the solutions 
to 


3a: i + 4x 2 — 0, 


2.25a:! + 3x 2 = 0. 

Since |A| = 0 and t(A) = 1, there are nontrivial solutions. The second 
equation is f times the first, and hence the general solution is 


x 2 = —fa:!. 


The value of xi can be chosen arbitrarily. The dimension of the subspace 
formed by the solutions is thus 1; it is a line through the origin with slope 
—f (see Fig. 5-1). A single vector forms a basis for this subspace. If 
x\ — 4, then x 2 = —3; hence x = [4, —3] is a basis vector, and x is a 
fundamental system for the set of equations. 


*2 



Figure 5-1 
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5-7 Geometric interpretation. We have already seen that the solutions 
to a system of rrt homogeneous linear equations in n unknowns generate a 
subspace of E n . This is true even if the only solution is x = 0; the single 
vector 0 yields a zero-dimensional subspace of E n . If n = 3, then the 
subspace spanned by the solutions to Ax — 0 is either a plane through 
the origin, a line through the origin, or the origin itself. (In the com¬ 
pletely degenerate case, when A = 0, the subspace is all of E 3 .) 

Examples: (1) The solutions to 

2x\ 4“ 3x 2 + 4x 3 = 0 

lie on a plane through the origin. 

(2) The solutions to 

2xi + 3 x 2 + 4 x 3 = 0 , 

X\ + 2x 2 + 2x 3 = 0 

lie on a line through the origin (the intersection of two planes which pass 
through the origin). 

(3) The solution of 

Xi + 2 x 3 = 0 , 

Xi + X 2 = 0 , 

Xi — 2 x 2 — 0 

is unique and trivial, that is, x = 0 is the only solution (the intersection 
of three planes which pass through the origin). 

Let us consider the geometric interpretation of the solutions to Ax = 
b^0, In this case, x = 0 is not a solution; hence we are sure that the 
set of solutions does not form a subspace of E n . However, if we are given 
one solution x x to Ax = b, then any other solution x 2 can be written 

x 2 — x x + x 2 — Xi = x x + y, y — x 2 — x x , (5-33) 

and 

Ay = Ax 2 — Ax x — b — b — 0. (5-34) 

Equations (5—33) and (5—34) show that if we know one solution x x to 
Ax = b, any other solution x 2 can be written x 2 = x x + y, where y is a 
solution to the homogeneous set of equations Ay = 0. In other words, all 
solutions to Ax = b can be generated by knowing a single solution to 
Ax — b and all solutions to the homogeneous system Ay = 0. 

The preceding results illustrate that the solutions to Ax = b will 
generate a space having the same dimension as the subspace spanned by 
the solutions of Ax = 0. The space spanned by the solutions of Ax = b 
is not a subspace since 0 is not a solution. The solutions are of the form 



178 


SIMULTANEOUS LINEAR EQUATIONS 


[chap. 5 



x = Xx + y, and x x translates the space away from the origin.* If n = 3, 
then the solutions to Ax = b lie on a plane, on a line, or, when the solu¬ 
tion is unique, it is represented by a point, that is, a space of zero dimen¬ 
sion. In general, the dimension of the space generated by the solutions 
of Ax = b is the nullity of A. 

Examples: (1) The solutions to 

2xi -|- 3x 2 + 4x3 = 12 

lie on the plane shown in Fig. 5-2. 

(2) The solutions to 

2xi + 3 x 2 + 4^3 = 12 , 
xi + 2 x 2 + 2x 3 = 4 

lie on the line shown in Fig. 5—3 (the intersection of two planes). 

5-8 Basic solutions. We now wish to study the solutions to a set of 
m equations Ax = b in n > m unknowns, which have as many of the 
variables equal to zero as possible. From Section 5-5, we know that if 
r(A) = k and we select any k linearly independent columns from A, we 
can assign arbitrary values to the n — k variables not associated with 
these columns. The remaining k variables will be uniquely determined in 
terms of the n — k variables. Thus for such a system, we can set n — k 


* The set of solutions to Ax = b is often referred to as an affine subspace of 
E n . It has the same dimension as the subspace generated by the solutions to 
Ax = 0, the only difference being that the affine subspace is translated away 
from the origin. 
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variables to zero; the other k variables will, in general, be different from 
zero since the equations must be satisfied (in certain cases, however, one 
or more of these k variables will be zero). To be specific, we shall assume 
that for our set of equations, 

r(A) = r(Afc) = m. (5-35) 

This implies that none of the equations is redundant (if there were re¬ 
dundant equations in the original set, we assume that they have been 
dropped). 

Then the columns of matrix A can be named so that A can be written 

A = (B, R), (5-36) 

where B is an m X m nonsingular matrix [this follows since r(A) = m] 
and R is an m X (n — m) matrix. The vector x can be partitioned as 
follows: 

x = [x B , Xr], Xb = [xi, x 2i . . ., x m ], Xb = [x m+ i, . . ., x n \. 

(5-37) 

Thus 

Ax = Bxb + Rxb = b. (5-38) 

All the solutions to this set of equations can be generated by assigning 
arbitrary values to Xr. Let us now set xr — 0. Since B has an inverse, 
we obtain 

xb = B- X b. (5-39) 

This type of solution to the system of equations is called a basic solution. 

Basic solution: Given a system of m simultaneous linear equations in n 
unknowns, Ax = b (m < n) and r(A) = m: If any m X m nonsingular 
matrix is chosen from A, and if all the n — m variables not associated with 
the columns of this matrix are set equal to zero , the solution to the resulting 
system of equations is called a basic solution. 

A basic solution has no more than m nonzero variables. It can be writ¬ 
ten x = [xb, 0], with Xb given by Eq. (5-39). The m variables which can 
be different from zero are called basic variables. Hence, in a basic solution 
n — m variables are set equal to zero, and the remaining m variables are 
uniquely determined since, by assumption, the matrix of their coefficients 
is nonsingular. 

The term “basic solution” refers to the fact that the columns of B form 
a basis for E m . Basic solutions are of great importance in linear program¬ 
ming. 
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How many basic solutions are possible in a system of m equations and 
n unknowns? This question is analogous to asking how many combinations 
of n variables there are when tal^en m at a time (the order of the variables 
in any basic solution is, of course, irrelevant). This number N is the stand¬ 
ard formula for combinations, that is, 


N = 


n\ 

m\(n — m)l ’ 


(5-40) 


it represents the maximum number of possible basic solutions. However, 
the columns of A associated with the m basic variables must be linearly 
independent so that an inverse exists. Since any m columns from A will 
not necessarily be linearly independent, we shall not always obtain the 
maximum number of possible basic solutions. 

It is of interest to know whether any X{ in the vector xb is zero. If this 
is the case, more than n — m variables will be zero in the solution. When 
this happens, we say that the basic solution is degenerate. 

Degeneracy: A basic solution to Ax = b is degenerate if one or more 
of the basic variables vanishes. 

A necessary and sufficient condition for the existence and nondegeneracy of 
all possible basic solutions of Ax = b is the linear independence of every set 
of m columns from the augmented matrix A & = (A, b). To prove the 
necessity let us suppose that all basic solutions exist and that none is 
degenerate. Then for any set of m columns, say a x , . . . , a m of A, 


m 

X x&i = b - (5-41) 

i=l 

and no X{ = 0. Since we assumed the existence of all basic solutions, every 
set of m columns from A must be linearly independent. In Section 2-9 
we made the point that any vector in a basis can be replaced by a given 
vector b if the coefficient of the vector to be replaced does not vanish in 
the expression of b as a linear combination of the basis vectors. According 
to Eq. (5-41), b can replace any a* in the basis. Hence b and any m — 1 
columns from A are linearly independent. The necessity is proved. 

To prove the sufficiency let us suppose that any m columns from A& 
are linearly independent. This immediately tells us that all basic solu¬ 
tions exist. When b is expressed as a linear combination of a x , . . . , a w , 
we arrive at Eq. (5-41). However, since a 2 , . - . , a w , b are linearly inde¬ 
pendent, the coefficient Xi of ai cannot vanish, because b can replace ai, 
and a basis is maintained. Similarly, since ai, a 3 , . . . , a m , b are linearly 
independent, the coefficient x 2 of a 2 cannot vanish. Thus we see that none 
of the Xi can vanish for any basic solution. Hence all basic solutions exist 
and are nondegenerate. 
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The preceding theorem has a corollary: A necessary and sufficient con¬ 
dition for any given basic solution xb — B“" 1 b to be nondegenerate is the 
linear independence of b and every m — 1 columns from B. If a solution is 
nondegenerate, b can replace any column of B and still maintain a basis 
since x* j* 0. Hence any m — 1 columns of B and b are linearly inde¬ 
pendent. Conversely, if b and any m — 1 columns of B are linearly inde¬ 
pendent, b can replace any column of B and still maintain a basis. Hence 
no Xi can vanish. 

Since the condition for the nondegeneracy of a basic solution is quite 
stringent, we may expect to find cases where the condition is violated and 
degeneracy occurs. This is quite true. The possibility of degeneracy 
complicates somewhat the theory of linear programming. 

Examples: (1) When m — 2, a basic solution has all but two x t - equal 
to zero. Degeneracy will occur if the b vector lies along the same line as 
any column a* from A. Not all possible basic solutions will exist if two 
columns from A are collinear. Let us consider Fig. 5-4. The system 

Ax = (ai, a 2 , a 5 )x = bi 

will be degenerate for any basic solution including a 3 since bi is collinear 
with a 3 , that is, vector b x can be expressed in terms of a 3 alone; hence the 
Xi corresponding to the other vector in the basis will vanish. However, all 
basic solutions exist. For the system 

Ax = (ai, a 2 , a 3 , a 5 )x = b 2 , 



Figure 5-4 
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all basic solutions exist and are nondegenerate. The system 
Ax = (a 1? a 2 , a 3 , a 4 )x = b 2 

does not have a basic solution involving a 2 , a 4 since they lie on the same 
line and do not form a basis. All existing basic solutions, however, are 
nondegenerate. In this case, one part of our theorem is violated, namely 
the part concerning the existence of all basic solutions. 

(2) Find the basic solutions to 


+ 2z 2 + *3 = 4, 

2*i + * 2 + 5*3 = 5. 


The possible number of basic solutions is 

JL_3 
2 ! 1 ! “ 6- 


First, we set * 3 = 0 and solve for * x and * 2 . This yields 



Then we set * 2 = 0 and solve for x x and * 3 : 


1 - 

i—1 

i—l 

1 _ 

H 

__1 


1 - 

1.2 5j 

J- 

CO 

1 _ 


L*J 


or 



-1 

1 



Finally we set *i = 0 and solve for * 2 and * 3 , that is: 




In this example, all basic solutions exist and none is degenerate. Hence 
any two vectors in the augmented matrix are linearly independent. The 
situation can be changed simply by replacing the 5 in the b vector with 
an 8. Thus if * 3 = 0, then *i = 4, * 2 = 0; if * 2 = 0, then x t = 4, 
*3 = 0. Finally, if *i = 0, 



In this case all three basic solutions exist, but two are degenerate (illus¬ 
trate this graphically). 



183 


References 

The following texts discuss, at least in part, systems of linear equations. 
None, however, deals with basic solutions. 

1 . A. G. Aitken, Determinants and Matrices . Edinburgh: Oliver and Boyd, 
1948. 

2. R. G. D. Allen, Mathematical Economics . London: Macmillan and Co., 
Ltd., 1956. 

3. G. Birkhoff, and S. MacLane, A Survey of Modern Algebra. New York: 
Macmillan, 1941. 

4. M. B6cher, Introduction to Higher Algebra. New York: Macmillan, 1907. 

5. P. S. Dwyer, Linear Computations. New York: John Wiley and Sons, 
1951. 

6 . W. L. Ferrar, Algebra. Oxford: Oxford University Press, 1941. 

7. F. B. Hildebrand, Introduction to Numerical Analysis. New York: 
McGraw-Hill, 1956. 

8. F. B. Hildebrand, Methods of Applied Mathematics. New York: Prentice- 
Hall, 1952. 

9. S. Perlis, Theory of Matrices. Reading, Mass.: Addison-Wesley, 1952. 

10. R. M. Thrall, and L. Tornheim, Vector Spaces and Matrices. New York: 

John Wiley and Sons, 1957. 


Problems 

Solve Problems 5-1 through 5-3 by (a) gaussian reduction, (b) Gauss-Jordan 
reduction, (c) Cramer's rule. 

5-1. 3xi 4 2x2 4 4x3 = 7, 5-2. xi 4 2 x 2 4 3x3 4 4x4 = 5, 

2xi + X 2 + X 3 =4, 2xi + X 2 + 4 X 3 + X 4 = 2, 

xi 4 3 x 2 + 5 x 3 = 2 . 3xi + 4 x 2 + X 3 + 5 x 4 = 6 , 

2 xi -f- 3 x 2 4 5 x 3 4 2 x 4 = 3. 

5-3. 4xi 4 2 x 2 4 5 x 3 4 7 x 4 4 xs = 8 , 

Xl 4 4X2 + X3 + X4 + 5X5 = 4, 

2 xi 4 3 x 2 4- 4 X 3 4 5 X 4 4 6 x 5 — 3, 

3xi + 9 X 2 4 7 X 3 4 X 4 + 8 x 5 = 16, 

7xi + X 2 + X 3 + 6 x 4 + X 5 = 9. 

5-4. Discuss in detail what happens if the gaussian elimination method is 
used for solving a system of n equations in n unknowns where (a) there is no 
solution, (b) the solution is not unique. 

5-5. Derive Cramer's rule for solving the system 

flllXl 4***4 ainX n = 6l, 

8 n lXl 4 * * * 4 8 »nX» — b n 

without use of matrix theory. Hint: Multiply the ith equation by An, the co¬ 
factor of fl*i, and add the equations. 
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5-6. Using the fact that A& can be transformed to an echelon matrix, prove 
that there is at least one solution to Ax = b if r(A) = r(A & ), and no solution if 
r(A) < r(A 6 ). 

5-7. Show that a collection of n-component vectors of the form x = 
(xi, X2) = (Bx2, X2), with xi = BX2, generates a subspace of E n whose dimen¬ 
sion is the number of components in X 2 (it is assumed that no restriction is 
placed on the values the components of X 2 may assume). Use this result to 
prove that the nullity of A = n — k provided that r(A) = k and A is an m X n 
matrix. 

5 - 8 . Show that a necessary and sufficient condition for a set of m vectors 
xi, . . . , x m to be linearly dependent is that the Gram determinant of these 
vectors vanish. The Gram determinant |G| is defined as: 

xixi • * • xix m 

\ 0 \ = 2 : 

XmXl • • * x' m Xm 

*5-9. Consider the set of m homogeneous equations in n unknowns Ax = 0 . 
Let a y (j — 1 , ,m) be the rows of A. Then the set of equations can be 

written a>x = 0 (j — 1 , . . ., m), that is, a solution x is orthogonal to every 
row of A. Using this expression and the concept of an orthogonal basis, show 
that if r(A) = n, then x = 0 is the only solution. 

*5-10. Prove that a necessary and sufficient condition for the existence of a 
solution to Ax = b is that b lie in the subspace spanned by the columns of A. 
Show that a necessary condition for the existence of a solution to Ax = b is 
that y'b = 0 for all y such that A'y = 0 . Can you prove that this condition 
is also sufficient? 

In Problems 5-11 through 5-16, determine whether a solution exists. If there 
is a solution, is it unique? Find a solution to every set for which solutions 
(a solution) exist: 

5-11. 3xi + 2 x 2 = 7, 5-12. x\ + 2 x 2 + X 3 = 1, 

xi + X 2 = 7. 2xi + 4 x 2 + 5 X 3 = 3. 

5-13. 2xi + 8 x 2 4~ 7 x 3 = 0, 5-14. 2xi 4~ 8 x 2 4" 7 x 3 = 1, 

Xi + 2X2 + 4x3 — 0, xi + 2X2 + 4X3 = 0, 

2xi -f- 4 x 2 + 6 x 3 = 17. 2xi 4- 4 x 2 6 x 3 = 0. 

5-15. 3xi + 7x 2 + 4 x 3 = 0, 5-16. 2xi + 3x 2 = 7, 

xi + 2 x 2 + X 3 = 0. 4xi + 6 x 2 = 3, 

xi 4 ~ 17x2 = 0 . 

In Problems 5-17 through 5-19, find a fundamental system of solutions: 

5-17. xi 4" 4~ £3 4" £4 = 0 , 5-18. xi 4~ 2 x 2 4” 4 x 4 = 0 , 

2xi 4~ 3x2 + X3 + 5x4 = 0. xi X2 5X3 4“ £4 “ 0, 

2xi 4” 2X2 4" %3 4~ ^4 = 0. 

* Starred problems require use of starred material in previous chapters or 
material with which all readers may not be familiar. 
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5-19. Xl + X2 — 0, 

2xi + 3^2 = 0 , 
xi + 7x2 = 0 . 

In Problems 5-20 through 5-22, find all existing basic solutions. Do all 
possible basic solutions actually exist? 

5-20. xi + 2x2 ~b 3x3 + 4x4 — 7, 

2xi ~b X 2 -f- X 3 -f- 2 x 4 — 3. 

5-21. 8 xi -f- 6 x 2 + 13x3 -|- X 4 -|- X 5 — 6 , 

9xi + X 2 + 2 X 3 + 6 x 4 + 10X5 = 11. 

5—22. 2xi -f- 3x2 + 4x3 ~b £4 = 2, 

Xl + X2 + 7X3 + X4 = 6, 

3xi -f” 2 X 2 “f" X 3 -f- 5 X 4 = 8 . 


5-23. Consider the vectors: 



Plot these vectors. Do all possible basic solutions exist to the following sets of 
equations? How many do exist? Are they nondegenerate? 


(a2, a3, a4, a6>x = bi, (a2, a4, as, a6)x = b2, 

(a2, a4, ae)x = b 3 , (ai, a2, a4, as)x = bi. 

5-24. Show that the Gauss-Jordan reduction scheme may be usefully applied 
to systems Ax = b, where AismXn and m < n. Illustrate this for equations 

3xi + 2X2 + X3 + 4X4 + 6x5 = 2, 

4xi + X2 + X3 + 5X4 + 7X5 = 10, 

Xl + 9X2 + 3x3 X4 -f- X5 = 7. 

*5-25. Recall that any solution x of the set of inhomogeneous linear equations 
Ax — b can be written x = xi + y, where y is a solution to the homogeneous 
set of equations Ay = 0 and xi is any solution to Ax = b. Furthermore, we 
can write y = ]£ c »*y* if the y,- form a basis for the null space of A. Compare 
this with the corresponding results for inhomogeneous linear differential equa¬ 
tions of nth order. 

5-26. Consider a set of m simultaneous linear equations in n unknowns, 
Ax — b, with r(A) = m. Suppose that a solution to this set of equations has 
exactly m nonzero variables. Furthermore, assume that when all variables 
other than these m variables are set to zero, the resulting set of equations 
uniquely determines the m variables. Show that the solution is a basic solution, 
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that is, the columns of A corresponding to the m nonzero variables are linearly 
independent. 

5-27. Consider a set of linearly independent vectors xi, . . ., x g in E n , q < n. 
Suppose that the vector yi is linearly independent of the set of vectors Xy; 
furthermore, the vectors j2 through y m are linearly independent of the xy. 
Assume in addition that the y; are linearly independent of each other. Prove 
that xi, . .., x g , yi, . . ., y m do not necessarily form a linearly independent set. 
Give a counterexample and illustrate it geometrically. This is equivalent to 
showing that £”=1 X,y t is not necessarily linearly independent of the Xy, although 
each yi is linearly independent of the Xy and the y» form a linearly independent 
set. 

5-28. Show how to routinize the gaussian reduction technique for solving a 
set of equations Ax = b. In particular, demonstrate that it is never necessary 
to use the variables explicitly. The reduction can be carried out by using only the 
augmented matrix A 6 = (A, b), that is, A& is reduced to row echelon form. If 
the row echelon matrix is (H, g), express the value of the variables in terms of 
the elements of this matrix. Discuss the use of the sum check, introduced in 
Problem 4-43, for the purpose of avoiding mistakes in solving equations by the 
gaussian reduction method. 

5-29. Show that the Gauss-Jordan method for solving a set of equations 
Ax = b can be used by making appropriate transformations on the matrix 
A & = (A, b) ; it is never necessary to introduce the variables explicitly. Sketch 
the structure of the matrix obtained after stages 1, 2, and 3. Demonstrate that if 
the elements of the matrix at the start of stage s are denoted by w t *y, then the 
elements #,y at the end of stage s are given by 

u sj = — , all j; Hij = Uij — — u ig , all i ^ s, all j. 

Uss U 8 8 

Show that a sum check can be used to guard against numerical mistakes. Solve 
Problem 5-2, using the technique suggested in the present problem. 

*5-30. We know that a set of m vectors xi, . . ., x m from E n , m < n, will be 
linearly dependent if the rank of X = (xi, . . ., x m ) is less than m, that is, if 
there is no minor of order m in X, which is different from 0. Problem 5-8 showed 
that the vectors are linearly dependent if their Gram determinant vanishes. 
What is the connection between these two conditions? Hint: G = X'X. Use 
the theorem on expanding the determinant of the product of two rectangular 
matrices. 

Problems Involving Complex Numbers 

5-31. List the important results obtained in this chapter. Show that all 
these results hold when, in the system of equations Ax =* b, the elements of 
A b = (A, b) are complex numbers. 

5-32. Solve the following set of equations by Cramer’s rule: 


(—5 -h 2i)x\ — Zix 2 + 4x 3 = 7 — i, 

(2 + i)x\ + (4 — 3i> 2 + (9 — i)x 3 = 2, 
7ix\ + 5x2 + (1 + 2i)x 3 = 4 + 6 i. 
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5-33. Solve the following set of equations by the Gauss-Jordan method: 
(4 + i)x i + (2 — i)x 2 + (7 + 4i)x 3 = 2 i, 

5xi -f- (3 + 5t)x2 + (8 — 9 i> 3 = 6 — i , 

Zix\ -f- (9 — 7i)x2 ~h* (2 —J— %)x 3 = 8. 

5-34. Find all basic solutions to the set of equations: 

(5 — 6t)xi + 2ix 2 H - 5x3 + (4 + 3 i’)x4 — 16, 

(3 + 2i)xi + (6 — 4t)x2 + (—3 + 2i)x3 -f- (8 — i)x 4 = 4 i. 



CHAPTER 6 


CONVEX SETS AND n-DIMENSIONAL GEOMETRY 

“We can make several things clearer, but we cannot make anything clear ” 

Frank P. Ramsey. 

6-1 Sets. In the preceding chapters, we have on various occasions 
used the notion of a set, for example: a set of n component vectors. The 
notion of a set is so basic that it is somewhat difficult to define it in terms 
of more fundamental ideas. The following expressions are synonymous: 
(1) a set of elements, (2) a collection of objects, (3) a number of things. 
A set consists of a finite or infinite number of elements. The concept of a 
set is clearly one of great generality; it is also very useful. Set theory was 
first introduced into mathematics by the German mathematician Georg 
Cantor at the end of the 19th century. Since then it has developed into a 
most important branch of mathematics. 

The main topic of this chapter is convex set theory. For many years, 
only a handful of men working in the field of pure mathematics were 
interested in convex sets. Recently, however, the theory has found im¬ 
portant applications in economics, linear programming, game theory, and 
statistical decision theory. This has stimulated interest in the subject, 
and within the last fifteen years a great deal of work has been done in 
developing the theory and applications of convex sets. Before turning to 
the theory of convex sets, we shall first study briefly some general topics 
of set theory and then develop the fundamentals of point sets. 

Sets will be denoted by capital letters, for example, A, B . The elements 
of the set will be denoted by a,-, b i} etc. Braces { } enclose the elements 
belonging to a set. Thus, the set A can be written 

A — {a*}. (6—1) 

Equality: Two sets A, B are equal, A = B, if they contain the same 
elements , that is, if every element of A is also an element of B, and con¬ 
versely. 

The notation a* E A indicates that a% is an element of A; hi £ A means 
that bi is not an element of A. 

Subset: A subset B of a set A is a set all of whose elements are in A. 
However, not all elements of A need to be in subset B. B is a proper subset 
of A if A contains at least one element which is not in B. 
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The notation B C A or A D B indicates that B is a subset of A. 

Intersection: The intersection of sets A, B, written A D B, is the set 
C = {ci} containing all elements common to A, B, that is, c* € A, c* E B. 

If there are m sets Ax, ..., A m , then Ax n A 2 fl .. . fl A m is the set 
of elements C common to Ax, ... , A m . We can write C as 

m 

c = n *<■ ( 6 -2) 

t=l 

Example: If A = {1, 2, 3, 5, 8, 15, 21} and 13 = {4, 2, 5, 15, 37, 52}, 
then 

C = AnB= {2,5,15}, 

since these elements are common to A and B. 

Union: The union of two sets, written A U B, is the set C — {c*} con¬ 
taining all elements in either A, or B, or both. 

If there are m sets Ax, . . . , A m , then Ax U A 2 ...U A m is the set of 
elements C in at least one of A\, ... , A m . C can be written as 

m 

C - 0 Ai. (6-3) 

t=i 

The symbols n, U are sometimes read “cap” and "cup,” respectively. 

6-2 Point sets. Point sets are sets whose elements are points or vectors 
in E n . Since we shall approach our subject from a geometrical standpoint, 
n-tuples will frequently be referred to as points in E n rather than vectors 
in E n ; recall, though, that there is no difference between a point and a 
vector.* Point sets may contain either a finite or an infinite number of 
points. However, the point sets to be considered here will usually contain 
an infinite number of points. 

Point sets are often defined by some property or properties which the 
set of points satisfy. In E 2 , for example, let us consider the set of points X 
lying inside a circle of unit radius with center at the origin, that is, the 
set of points satisfying the inequality 

x\ -f* %2 ^ 1* 

A convenient representation for the set X is 

_ X = {[xi,x 2 ]\xi + x\ < 1}. 

* However, here we prefer to continue to use the term “component” rather 
than “coordinate” for any element in an n-tuple. 
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In general, the notation 

X={x\P(x)\ (6-4) 

will indicate that the set of points X = {x} has the property (or proper¬ 
ties) P(x). In the above example the property P is the inequality 
x\ + x\ < 1. 

If there are no points with the property P, then the set (6-4) contains 
no elements and is called empty or vacuous. An empty set will be denoted 
by 0. When discussing the properties of sets, we shall always assume 
that there is at least one element in the set unless otherwise stated. 

Examples: (1) X = {[xi, x 2 ]|xf + x\ < 1} is the set of points lying 
inside and on the circumference of the circle of radius unity with its 
center at the origin (compare with the preceding example). 

(2) X = { [xi f x 2 ]\2xi + 3x 2 = 4} is the set consisting of all points 
on the line 2xi + 3x 2 = 4. 

(3) X = {[xi, x 2 ]|x x > 0, x 2 > 0,xi < 1, x 2 < 1} is the set of points 
inside the square with corners [0, 0], [1, 0], [1, 1], [0, 1]. 

(4) X = {[xi, x 2 ]|xf + x\ > 1, x\ + x\ < 4} is the set of points in¬ 
side the annulus with its center at the origin, an outer radius of 2, and an 
inner radius of 1. 

The notion of a point set enables us to illustrate geometrically the con¬ 
cepts of the union and intersection of sets. Let us define two sets A 
and B by 

A = {[x u x 2 ]\xi + x\ < 1}, 

and 

B = {[x lf x 2 ]\(xi — l) 2 + x\ < 1}. 


A n B is the shaded region shown in Fig. 6-1; A U B is the shaded region 
shown in Fig. 6-2. A, of course, is the set of points inside and on the 
circle of radius unity with center at the origin; B is the set of points in¬ 
side and on the circle of radius unity with center at Xi = 1, x 2 = 0. 


x 2 



X 2 



Figure 6-1 


Figure 6-2 
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According to Section 2-6, the distance between two points x, a is given 
by 

|x — a| = [(xj — flj) 2 -|-+ (x„ — a„) 2 ] 1/2 . (6-5) 

If a is a fixed point and x a variable point, then 

X = {xj |x — a| = e] (6-6) 

is the set of points x which are at a given distance e from a. In E 2 f X 
becomes 

[(xi — ai) 2 + (x 2 — a 2 ) 2 ] 1/2 = e or (xj — a{) 2 + (x 2 — a 2 ) 2 = e 2 ; 

this is the equation of a circle of radius e with its center at [oi, a 2 ]. Any 
points on the circle are a distance e from the center. In E 3 , Eq. (6-6) 
represents the points on a sphere. In E n , the relation |x — a| = e defines 
a hypersphere. 

Hypersphere: A hypersphere in E n with center at a and radius e > 0 
is defined as the set of points 

X = (x| |x - a| = €). ( 6 - 7 ) 

Hence, the equation of a hypersphere in E n is 

|x a| = e, 
or 

X) ( X i ~ a i) 2 = « 2 ‘ (6-8) 

*=1 

In the preceding paragraph, we generalized the concept of a circle in E 2 
and a sphere in E 3 to what we called a hypersphere in E n . Note that if 
the equations for a circle in E 2 or a sphere in E 3 are written in vector no¬ 
tation, they both become |x — a| = e, which is also the vector form of a 
hypersphere in E n . This gives us a key to a very useful procedure for 
generalizing ideas applicable for E 2 and E 3 to E n : The vector form ob¬ 
tained for relations in E 2 and E 3 will be a suitable definition in E n . 

We shall frequently be interested in points which are “close” to point a, 
that is, which can be considered to be “inside” some hypersphere with its 
center at a. In E 2 , E 3 , the inside of a circle or sphere can be represented 
in vector form by the inequality |x — a| < e. This immediately suggests 
an appropriate definition for E n . 

Inside: The inside of a hyper sphere with center at a and radius € > 0 
is the set of points 


X = {x| |x — a| < «}. 


(6-9) 
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€ Neighborhood: An e neighborhood about the point a is defined as the 
set of points expressed by (6-9), that is, the set of points inside the hyper¬ 
sphere with center at a and radius e > 0. 

When discussing e neighborhoods, we shall often assume that e is small. 
In general, however, e can be any positive number whatever. 

In two and three dimensions, the meaning of the terms “interior” and 
“boundary” point of a set is intuitively clear. This intuition fails us in 
E n , and therefore we need analytic definitions to determine for any given 
point whether it can be considered an interior or a boundary point of a 
set. The following definitions are obvious generalizations from E 2 and E 3 . 

Interior point: A point a is an interior point of the set A if there exists 
an € neighborhood about a which contains only points of the set A. 

It may be true that for the neighborhood to contain only points in the set, 
€ will have to be very small. However, it is immaterial how small e is, as 
long as e > 0. An interior point a must be an element of the set because 
every e neighborhood of a contains a. 

Boundary point: A point a is a boundary point of the set A if every e 
neighborhood about a {regardless of how small e > 0 may be) contains 
points which are in the set and points which are not in the set. 

Note that a boundary point does not have to be an element of the set A. 
In Fig. 6-3, ai is an interior point and a 2 is a boundary point of A. 

The concepts of interior and boundary points lead to the notion of open 
and closed sets, since the boundary points may or may not be elements of 
the set. 


Open set: A set A is an open set if it contains only interior points. 
Closed set: A set A is a closed set if it contains all its boundary points. 
Examples: (1) The set 

X = {[Sx, Xi]\Xi + x\ < 1} 


is open because points on the circumference of the circle are not included. 
For any point [xi, x 2 ] in the set, it is possible to find an e such that all 
points [x[, x' 2 ] given by {x[ — Xi) 2 + {x 2 — x 2 ) 2 < e 2 are in the set. 
A suitable value for e is one-half the distance from [xi, x 2 ] to the point on 
the circumference lying on a radial line through [#i, x 2 \. Hence, every 
point is an interior point and the set is open. 

(2) The set 

x = {[Xu x 2 ]\xj + x\ < 1} 


is closed since every point on the circumference is a boundary point, 
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and all boundary points are included in the set. For any point on the cir¬ 
cumference, all neighborhoods about the point contain points which are 
in the set and points which are not. 

It should be noted that open and closed sets are not mutually exclusive 
or collectively exhaustive concepts. There are sets which are neither 
open nor closed, such as for example: 

X = {[xi, x 2 ]\xf + x\ < 1 for x 2 > 0; x\ + x\ < 1 for x 2 < 0}. 

Below the x x -axis, the circumference of the circle is not included; above 
the x x-axis, it is. Thus, not all the boundary points are in the set, and it 
is not closed. However, not every point is an interior point since some 
boundary points are in the set, and hence the set is not open. 

Some sets can be considered to be both open and closed. For example, 
the set containing every point in E n is open since an e neighborhood about 
any point contains only points of E n . However, E n has no boundary 
points, and therefore the set contains all its boundary points and is closed. 

Complement: The complement of any set A in E n , written A , is the set 

of all points in E n not in A. 

Note that A U A — E n , A n A = 0; furthermore, if a is a boundary 
point of A y it is a boundary point of A, and vice versa. Hence, if A is 
closed, A is open, and conversely. 

Example: The complement of the set 

X = {[x lt x 2 ]\xi + x\ < 1} 
is 

X = {[xi, x 2 ]|x? + X2 > 1}. 
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Some sets have the property of being "bounded,” that is, the com¬ 
ponents of the points in the set cannot become arbitrarily large or small. 
We shall define three forms of boundedness: 

Strictly bounded: A set A is strictly bounded if there exists a positive 
number r such that for every a E A, |a| < r. 

A strictly bounded set lies inside a hypersphere of radius r with its center 
at the origin. 

Bounded from above: The set A is bounded from above if there exists 
an r with each component finite such that for all a G A 

a < r. 

A set which is bounded from above has an upper limit on each component 
of every point in the set. 

Bounded from below: The set A is bounded from below if there exists 
an r, with each component finite , such that for each a G A 

r < a. 

A set which is bounded from below has a lower limit on each component of 
every point in the set. 

Examples: (1) The set X = {[xx, x 2 ] |(^i — 3) 2 + ( x 2 — 4) 2 < 4} is 
strictly bounded, since every point in the set lies inside the circle of radius 
10 with center at the origin. 

(2) The set X = {{x u x 2 ] \x x > x 2t x x > 0, x 2 > 0} is bounded from 
below by the origin. It is not strictly bounded, however, because x Xf x 2 
can become arbitrarily large positive numbers. 

6-3 Lines and hyperplanes. We shall find considerable use for the 
notion of a line in E n . A suitable definition will be, as usual, the vector 
form obtained by formulating the equation for a line in E 2 and E 3 in 
vector terms. In Fig. 6-4, consider the two points x lf x 2 and the line 
passing through them. The vector x 2 — x x is parallel to the line passing 
through Xi, x 2 . Any point x on the line passing through x lf x 2 can be 
written 

x = Xi + X(x 2 — Xi) = Xx 2 + (1 — \)x x (6-10) 

for some scalar X (parallelogram law for addition of vectors). Thus 
Eq. (6-10) is the vector form for the line through Xi, x 2 in E 2 . A similar 
analysis shows that (6-10) represents a line in E 3 . If Eq. (6-10) is written 
in component form, the ordinary parametric equations for a line are ob¬ 
tained. Thus, Eq. (6-10) is used to define a line in E n . 
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Line: The line passing through points x x , x 2 (x x 5^ x 2 ) in E n is defined 
to be the set of points 

. X = {x|x = \x 2 + (1 — X)x x , all real X), (6-11) 

The vector equation (6-10) traces a line in E n as X takes on all possible 
values. 

From Fig. 6-^1 we see that any point on the line segment joining x x , 
x 2 can‘ be written 

x = x x + X(x 2 — Xi) = Xx 2 + (1 — X)x x , 0 < X < 1. (6-12) 

Thus we make the following general definition: 

Line segment: The line segment joining points x x , x 2 in E n is defined 
to be the set of points 

X = (xjx = Xx 2 + (1 — X)x x ,0 < X < 1}. (6-13) 

Many sets have the property of being "connected”; that is, their ele¬ 
ments do not exist in groups which are completely isolated from other 
points in the set by points not belonging to the set. For example, the set 
defined by 

X = {[* 1 , x 2 ]\xi + x\ < 1 or (Xi — 5) 2 + (x 2 — 6) 2 < 1} 

is not connected. Some points of the set are inside the circle of radius 
unity with center at the origin, and others inside the circle with radius 
unity and center [5, 6]. These two circles are isolated from each other. 
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There is no path leading from one circle to the other while remaining 
entirely within the set. 

Connected set: An open set is connected if any two points in the set 

can be joined by a polygonal path lying entirely within the set . 

A polygonal path joining x 1} x 2 can be defined as follows: Choose any 
R points yi = Xi, y 2 , . . ., Yr-u Yr = x 2 . Form the line segments 
joining yi and y 2 , y 2 and y 3 , . . ., Yr-i and y R . The set of points defined 
by these R — 1 line segments is a polygonal path. Intuitively, a polygonal 
path is merely a broken line. A connected set and a typical polygonal path 
are shown in Fig. 6-5. The simplest polygonal path connecting two points 
is the line segment joining these points. 

A set does not need to be open to be connected. However, with closed 
sets, using the polygonal path as the criterion of our definition may land 
us in difficulties. For example: From the preceding discussion we would 
assume that the set of points on the circle x\ + x\ — 1 represents a 
connected set. This set is closed (every point is a boundary point), and 
there is no polygonal path lying entirely within the set which connects 
two different points on the circle. Hence, for this closed set, we must 
generalize the notion of the allowed path connecting any two points. 
The path must be what we think of in two and three dimensions as 
an arbitrary continuous curve. We shall not make this generalization. 
For many closed connected sets, it is possible to find a polygonal path 
connecting any two points and therefore to demonstrate that the set is 
connected. 

Region: A region in E n is a connected set of points in E n . 

A region may or may not contain some or all boundary points of the set. 

In E 2 , c x xi + c 2 x 2 = z (ci, c 2l z constant) represents a straight line; 
in E 3 , C\Xi + c 2 x 2 + c 3 x 3 = z is the equation for a plane. If, in E 2 , 
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we write c = (c x , c 2 ), x = [a?i, x 2 ]; and, in E z , c = (c u c 2y c 3 ), x = 
[xi, x 2 , z 3 ], then the line in E 2 or the plane in E z can be written as the 
scalar product cx = 0 (6 _ 14) 

The equivalent in E n of a plane in E 3 or a line in i£ 2 is a hyperplane; 
it will be defined by Eq. (6-14). 

Hyperplane: A hyperplane in E n is defined to be the set of points 

X — {x|cx = z}, (6-15) 

with c 9 ^ 0 being a given n-component row vector and z a given scalar. 

If the equation for a hyperplane is written out, we obtain 

cx = C\X\ + c 2 x 2 + • • • + c n x n = z, (6-16) 

and any x satisfying (6-16) lies on the hyperplane. 

A hyperplane passes through the origin if and only if z — 0. When 
z = 0, Eq. (6-16) becomes 

cx = 0; (6-17) 

we see that c is orthogonal to every vector x on the hyperplane, and hence 
we can say that c is normal to the hyperplane. If z ^ 0, and x x , x 2 are 
any two distinct points lying on the hyperplane, then 

CXi — cx 2 = c(xx — x 2 ) = z — z = 0, 

and c is orthogonal to every vector x x — x 2 (x x , x 2 being on the hyper¬ 
plane). In E 2 and E z > vector x x — x 2 is parallel to the line or plane, 
respectively. Thus, even with z 0, we can say that c is normal to the 
hyperplane. 

If cx — z is multiplied by the scalar X ^ 0, we have (Xc)x = Xz. The 
same hyperplane is defined by either Xc and Xz or c, z. Hence if c is 
normal to a hyperplane, so is Xc. Let us suppose that X = l/|c|. Then if 



(6-16) becomes 

nx = 6, (6-18) 

and |n| = 1. The vector n is called a unit normal to the hyperplane. 
There are two unit normals to any hyperplane; the other is given by n = 
—c/|c|. The above discussion can be summarized in the following defini¬ 
tion for E n : 

Normals: Given the hyperplane cx = z in E n (c j* 0), then c is a vector 
normal to the hyperplane. Any vector Xc is also normal to the hyperplane 
(X 0). The two vectors of unit length c/|c|, — c/|c| are the unit normals 
to the hyperplane. 
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In E 2 and E 3 , the concept of a normal to a hyperplane can be clearly 
illustrated (see Fig. 6-6): Let n be a vector of unit length lying along the 
line normal to c t xi + c 2 x 2 — 2 . Since n points towards the line, moving 
the line cx = z parallel to itself in the direction of n increases its distance 
from the origin. It will be noted that for any x on the line cx = z , 
b = |x| cos 6 or 

nx = b — |x| cos 0; (6-19) 

this is the equation of the line. If we multiply Eq. (6-19) by a X such 
that z = \b, we see that Xn = c, and X = d=|c|. Hence c is normal to 
the line. The same reasoning applies to a plane in E 3 . This discussion 
also demonstrates that, in E 2 or E 3 , b is the distance of the line or plane 
from the origin. Similarly , in E n , \z\/\c\ (\z\ is the absolute value of z) is 
the distance of the hyperplane from the origin. 

If we move any line in E 2 parallel to itself in the direction of n (see 
Fig. 6-6), b and the distance of the line from the origin increase. Let us 
consider the line cx = z: If z > 0, then c points in the same direction 
as n; hence, moving the line parallel to itself in the direction of c increases z. 
However, if z < 0, c points in the direction opposite to n; moving the 
line parallel to itself in the direction of c decreases the absolute value of z 
(moves the line closer to the origin), but increases its algebraic value. 
Thus both alternatives result in an algebraic increase of z. 

Example: Consider 2x\ + 3x 2 — —6 in Fig. 6-7. A normal vector 
to the line is c = (2, 3). Line 2zi + 3x 2 = 0, obtained by moving the 
first line parallel to itself in the direction of c, has indeed a greater algebraic 
value of z. The slope is, of course, the same for both lines. 

The intuitive concept of moving a line or plane parallel to itself in order 
to increase the algebraic value of z can easily be generalized to E n . First, 


x 2 



X 2 



Figure 6-6 


Figure 6-7 
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we define the notion of two parallel hyperplanes by direct extension from 
E 2 and E z . 

Parallel hyperplanes: Two hyperplanes are parallel if they have the 
same unit normal. 


Thus the hyperplanes CiX = z X) c 2 x = z 2 are parallel if Ci = Xc 2 , \ ^ 0. 

To understand the concept of moving a hyperplane parallel to itself let 
us consider the hyperplane with normal c passing through x 0 , that is, 
cx = zq where cx 0 — z 0 . Let us find the value of z for a hyperplane with 
normal c passing through the point Xi = x 0 + Xc', X > 0 (we use c' 
since c is assumed to be a row vector). When x x = x 0 + Xc', X > 0, 
we can say that in going from x 0 to x x , we move in the direction of c 
(illustrate this geometrically). The hyperplane through x x is cx — z lf 
where cxi — z x . But 


Therefore, 


and 


cxi = c(x 0 + Xc') = z 0 + X|c| 2 . 
Z\ = Zq + X|c| 2 , 

z\ > z 0f since X > 0, |c| 0. 


( 6 - 20 ) 


Hence z x is algebraically greater than z 0 . Furthermore, if x 0 is any point 
on the hyperplane cx = z 0 , then x x = x 0 + Xc' lies on the hyperplane 
cx = zi. Thus we say the hyperplane cx = z 0 has been moved parallel 
to itself in the direction of c to yield cx = z x . Points lying on the hyper¬ 
plane cx = zi satisfy the inequality 


cx > z 0 , (6-21) 

that is, moving a hyperplane parallel to itself in the direction of c moves 
it in the “greater than” direction. 

Note: In linear programming the function to be optimized, z = J^CjXj, 
is a hyperplane for a given value of z. Since the Cj are constants, the hyper¬ 
planes corresponding to different values of z have the same normal and are 
therefore parallel. Hence, to maximize z, we move this hyperplane parallel 
to itself over the region representing the feasible solutions until z is made 
algebraically as large as possible. If z is positive, we move the hyperplane 
in the direction of c until it is as far away from the origin as possible. If z 
is negative, then the hyperplane is moved as close to the origin as possible. 

We have noted previously that any two different points in E n can be 
used to define a line in E n . In order to determine a hyperplane, it is 
necessary to specify the n components of c and z, that is, n + 1 parameters. 
These are determined only up to a multiplicative constant, however; 
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that is, \C{ (i = 1 , . . ., n), \z yield the same hyperplane as the c* and z. 
Thus, there are only n independent parameters, and hence n points in E n 
will be needed to determine a hyperplane; however, not any arbitrary set 
of n points will provide a unique definition. Only a set of n points 
Xi, . . ., x n which can be numbered so that the n — 1 vectors x x — x n , 
*2 - Xn, • • • > X n — \ X n are linearly independent, will describe a unique 
hyperplane. (The n points x x , . . . , x n may or may not form a set of n 
linearly independent vectors; it is the set of differences x x — x n , etc., 
which is of importance.) 

To prove this let us consider the set of equations 

cx t — 2 = 0, i = 1,.. ., n. (6-22) 

This expression represents a set of n homogeneous linear equations in 
n + 1 unknowns, the c* and z. If the nth equation is used to eliminate z, 
we obtain a new set of n — 1 homogeneous equations in n unknowns, 

tl,e C " c(Xi — x n ) = 0, *= 1,1. (6-23) 

If the vectors x t — x n are linearly independent, the matrix of the co¬ 
efficients ||Xji — Xj n ||' has rank n — 1 and nullity 1. Thus, there is a solu¬ 
tion to (6-23) with not all c* = 0, and with only one degree of freedom, 
that is, the c; are determined up to a multiplicative constant. Then 
the nth equation of (6-22) uniquely determines z for any c. We have 
proved that n points in E n for which the vectors x x — x n , . . . , x n _ x — x n 
are linearly independent uniquely define a hyperplane in E n . If the rank 
of ||— Xj n \\ is n — k, then, according to Section 5-6, we can determine 
k linearly independent vectors c which satisfy (6-23); in this case the 
chosen points lie in the intersection of k hyperplanes. Furthermore, the 
k hyperplanes are not unique if A; > 2; there exists an infinite number of 
sets of k hyperplanes whose intersection contains the n points. 

Let us consider any point x in E n such that x — x n can be written as a 
linear combination of the n — 1 linearly independent vectors x x — 
x w , . . . , x n —! — x n . Then x lies on the hyperplane determined by the 
points Xj, . .. , x n since, by Eq. (6-23), c(x — x n ) = 0 or cx = cx n = 2 . 
Furthermore, any point x on the hyperplane has the property that x — x n 
can be written as a linear combination of X! — x n ,. . ., x n _i — x n . If 
this were not true, x — x n would be linearly independent of x x — x n , , 
x n _i — x n ; then if the equation c(x — x n ) = 0 were annexed to (6-23), 
the rank of the matrix of the coefficients would be n, and the only solution 
would be c = 0. Hence, if we choose any n points on a hyperplane with 
the property that Xi — x„, . . ., x n _! — x n are linearly independent, then 
any other point x on the hyperplane is such that x — x n can be written as 
a linear combination of x x — x n , ..., x n _x — x n or, equivalently, x can 
be expressed as a linear combination of Xi, . .., x n . 
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We can choose x n = 0 for a hyperplane through the origin. Then any 
n — 1 linearly independent points Xi, . . . , x n _i on the hyperplane can 
be used to determine c, and every other point on the hyperplane can be 
written as a linear combination of Xi, . . ., x n _i. 

A hyperplane cx = z in E n divides all of E n into three mutually exclu¬ 
sive and collectively exhaustive sets. These are 

(1) Xi = {x|cx < z}, (6-24a) 

(2) X 2 = {x|cx = z}, (6-24b) 

(3) X 3 = {x|cx > *}. (6-24c) 

Open half-spaces: The sets X\ = {x|cx < z\ and X 3 — {x|cx > z } 

are called open half-spaces. 

In E 2 y E 3 a half-space is all of E 2 or E s lying on one side of a line or plane, 
respectively. 

Closed half-spaces: The sets X 4 = jx|cx < z\ and X 5 = {x|cx > z) 

are called closed half-spaces. 

Note: X 4 n X 5 = X 2 , that is, if x is on cx = z , it is in both closed half¬ 
spaces. However, no point can be in more than one of the sets Xi, X 2 , X 3 , 
and every point is in one of these sets. Furthermore X\ n X 3 = 0. 

It is easy to see that hyperplanes are closed sets. Choose any point x 0 
on the hyperplane cx = z. Form an e neighborhood about the point x 0 
and consider the point x x = x 0 + (e/2)(c'/|c|). The point Xi is in the 
e neighborhood since |xi — x 0 | = e/2 < e. However, 

cx x = cx 0 + ~ |c| = Zo + | |c| > Zo, 

\ 

and X! is not on the hyperplane. This holds true for every e > 0. There- 1 
fore, every point on a hyperplane is a boundary point. A hyperplane has 
no interior points. Furthermore, there are no boundary points for a 
hyperplane which are not on the hyperplane. A point in either open 
half-space cannot be a boundary point for the hyperplane since it is always 
possible to find an e neighborhood about such a point, with all points in 
the neighborhood being in the open half-space. To show this explicitly, 
let us suppose cx x = z\ < z. Take e = (z — z i)/2|c|. For any point 
x in this e neighborhood of Xi, 

cx = cxi + c(x — Xi) < Zi + |c(x — x r )I, 
and by the Schwarz inequality, 

CX < Zi + |c| |x — Xil < Zi + |c|e = Zi + - 2 ^ - < z. 
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Hence every point in this e neighborhood of Xi is in the half-space and Xi 
cannot be a boundary point of the hyperplane. Consequently, a hyper¬ 
plane contains all its boundary points and is a closed set. 

A closed half-space is also a closed set. The boundary points of the half¬ 
space X 4 or X 5 are all the points in the set X 2 — {x|cx = zj, because 
every e neighborhood of any point on c x = z contains points in both 
closed half-spaces, that is, every point on cx = z is a boundary point of 
either closed half-space. The preceding discussion has also demonstrated 
that for any point in an open half-space there exists an e neighborhood 
which contains only points in the open half-space. Hence, no such point 
can be a boundary point of the closed half-space. However, a closed half¬ 
space contains all the points cx = z, that is, all its boundary points, and 
is a closed set. 

6-4 Convex sets. 

Convex set: A set X is convex if for any points x ly x 2 in the set , the line 
segment joining these points is also in the set . 

This definition implies that if Xi, x 2 E X, then every point 

x = Xx 2 + (1 — X)x x , 0 < X < 1, 

must also be in the set. It is immediately obvious that a convex set is 
connected because the polygonal path joining the points is the line join¬ 
ing the points. Hence, a convex set is a region. By convention, we say 
that any set containing only one point is convex. 

The expression (1 — X)x x + Xx 2 , 0 < X < 1, is often referred to as a 
convex combination of x x , x 2 (for a given X). A set is convex if every 
convex combination of any two points in the set is also in the set. 

Intuitively, a convex set cannot have any “holes” in it, that is, it is 
“solid,” and not “re-entrant,” i.e., its boundaries are always “flat” or “bent 
away” from the set. These intuitive ideas, of course, are all rigorously 
expressed in the definition of a convex set. 

Extreme point: A point xeI is an extreme point of the convex setX 
if and only if there do not exist points xi, x 2 (x x ^ x 2 ) in the set such that 

x = (1 - X)x x + Xx 2 , 0 < X < 1. (6-25) 

Note that strict inequalities are imposed on A. The definition stipulates 
that an extreme point cannot be “between” any other two points of the 
set, that is, it cannot be on the line segment joining the points 
(0 < X < 1). Clearly, an extreme point is a boundary point of the set. 
To prove this, let us suppose that x 0 is any interior point of X. Then there 
is an e > 0 such that every point in this e neighborhood of x 0 is in the set. 
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Let Xi j* x 0 be a point in the e neighborhood. Consider the point (illus¬ 
trate this geometrically) 

x 2 = —xi + 2x 0 , |x 2 — x 0 | = |xi — x 0 |; 

then x 2 is in the € neighborhood. Furthermore, 

X 0 = K X 1 + X 2); 


and hence x 0 is not an extreme point. 

Not all boundary points of a convex set are necessarily extreme points. 
Some boundary points may lie between two other boundary points. 

If a convex set contains only a single point, this point will be considered 
an extreme point. 


Examples: ( 1 ) A triangle and its interior form a convex set. The 
vertices of the triangle are its only extreme points, since they do not lie be¬ 
tween any other two points of the set. The other points on the sides of 
the triangle are not extreme points because they lie between the vertices. 


(2) The set 


X = {[*!, X 2 ]\xl + xl < 1} 


is convex. Every point on the circumference is an extreme point. 

(3) The set in Fig. 6-8 is not convex, since the line joining Xi and x 2 
does not lie entirely within the set. The set is re-entrant. 

(4) The four sets in Fig. 6-9 are convex, and the extreme points are 
the vertices. Point x x is not an extreme point because it can be repre¬ 
sented as a convex combination of x 2 , x 3 with 0 < X < 1. 


A hyperplane is a convex set: If x x , x 2 are on the hyperplane, that is, 
cxi = z and cx 2 = 2 , then x = Xx 2 + (1 — X)x x is on the hyperplane, 
since 



Figure 6-8 


Figure 6-9 
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Similarly, a closed half-space is also a convex set. Suppose Xj, x 2 are in 
the closed half-space cx < z; if x = Xx 2 + (1 — X)xi (0 < X < 1), then 

cx = Xcx 2 + (1 — X)cxi < \z + (l — \)z = z, 

and x is in the half-space. The same arguments will prove that an open 
half-space is a convex set. 

The intersection of two convex sets is also a convex set. Given the convex 
sets X lf X 2 , let x x , x 2 be any two points in X 3 = X x n X 2 (if there is 
only one point in X 3 , then X 3 is automatically convex). Thus, 

Xx 2 H“ (1 — X)x t Gl i for 0 < X < 1, 

Xx 2 + (1 — X)x x Gl 2 for 0 < X < 1. 

Hence 

Xx 2 + (1 - \)x x GX l nX 2 = X 3 , 

and Xs is convex. In Problem 6-15 the reader will be required to show 
that if Xi(i = 1, . . . , m) are convex, then X = Cif =x Xi is also convex. 

If X i, X 2 are closed sets , then X 3 — X x fll 2 is also closed. We see 

immediately that an interior point of both X x and X 2 will also be an in¬ 
terior point of X z . Similarly, a point which is not in X x and/or not in X 2 
cannot be a boundary point of X 3 . Therefore, every boundary point of 
X 3 is a boundary point of X x or X 2 . However, X x , X 2 , and hence X 3 
contain all their boundary points; thus X 3 is closed. The same is true for 
the intersection of any finite number of closed sets (proof to be furnished 
in Problem 6-16). 

We have shown that hyperplanes and half-spaces are convex sets. 
Since the intersection of any finite number of convex sets is also convex, 
the intersection of a finite number of hyperplanes , or half-spaces y or of both 
is also a convex set. Furthermore, the intersection of a finite number of hyper- 
planes, or closed half-spaces , or of both is a closed convex set. 

This has immediate application to linear programming. In Section 1-4, 
we saw that the constraints on a linear programming problem can be 
written r 

^ j O'ijXj' ^ i = 1 , . . . , m, (6—26) 

where one of the >, =, < signs holds for each i. In addition, there are 
the non-negativity restrictions 

> 0, 3 = 1, • • . , r. (6-27) 

If we define the row vector by 


a — (a,i Xy . . . , (!*>), 


(6-28) 
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Eq. (6-26) becomes 

a*x{< = >}biy i = 1 , ,m. (6-29) 

Each one of the constraints requires that the allowable x be in some given 
closed half-space in E r , or, if the strict equality holds, lie on some given 
hyperplane. 

The non-negativity restrictions can also be written in the form of (6-29). 
If we write 

a w+y = e}, (6-30) 

then Xj > 0 becomes 

a m+y x = e'x >0, j = 1,. . ., r. (6-31) 

Each of the non-negativity restrictions also requires that the allowable x 
be in some closed half-space. The region of E n defined by xj > 0, 
j = 1, . .., n, is called the non-negative orthant of E n . 

Now we can see that any feasible solution x must simultaneously be an 
element of each of the following sets [(6-29) and (6-31)]: 

Xi — {x|a l x(< = >)&»}, i = 1, . . ., m + r, (6-32) 

where bi = 0, i = m + 1, . . ., ra + r. This is equivalent to saying that 
the set of feasible solutions X is the intersection of the sets X%: 

ra+r 

X = fl (6-33) 

i= 1 

Therefore, the set of feasible solutions to a linear programming problem (if 
a feasible solution exists) is a closed convex set. 

The preceding analysis also shows that the set of solutions to a system 
of m linear equations in n unknowns, Ax = b, is a closed convex set. To 
see this, note that the set of equations can be written 

a*x = bi, i = 1, . . . , m, (6-34) 

where a* is the ith. row of A. The set of points which satisfies the ith equa¬ 
tion comprises the hyperplane a l x = b z . The set of points which simul¬ 
taneously satisfies all m equations is the intersection of the m hyperplanes 
(6—34); it is therefore a closed convex set provided that a solution exists. 
Furthermore, the set of non-negative solutions to Ax = b, that is, the 
set of solutions with x > 0, is also a closed convex set. 

We have already defined a convex combination of two points Xi, X 2 
to be \x 2 + (1 — X)xi, 0 < X < 1. This definition can easily be gen¬ 
eralized to the concept of a convex combination of m points. 
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Convex combination: A convex combination of a finite number of points 
Xi ,,x m is defined as a point 

W m 

x = X Mi*i, Mi > 0, i = 1,. . ., TO, Xi“i= 1. (6-35) 

*'=1 i = 1 

A convex combination of m points can be interpreted physically. Suppose 
that we associate a mass mi with the point x*. The center of mass of the 
m points is then given by 


1 w 

~Af rtliXi, M = 'y ^ TTlj-. 


(6-36) 


If pi — mi/M f the center of mass is obtained from (6-35). Hence a 
convex combination of m points can be thought of as the center of mass 
of the points, with the mass assigned to x» being a fraction pi of the total 
mass. 

The set of all convex combinations of a finite number of points x x , . . . , x m 
is a convex set, that is, the set 

{ m m \ 

x l x = X M.*.-, all Mi > o, X) = l} (6-37) 

i=l i=l 9 

is convex. To prove this, let v, w be any points such that 

m 

v = X ^ x i, m; > o, x>i = i, 

i=i 

m 

w = X m'/x<, m" > o, X m7 = 1. 

i=l 

The set will be convex if Xw + (1 — X)v is also in the set for any 
X (0 < X < 1). Now 

m 

Xw + (1 - X)v = X [Xm 7 + (1 - X)Mi]x». (6-38) 


Xm" + (l - x)m» > 0, 


m 

X IXm 7 + (i - x)m»] = x X) m 7 + (l - x) £ Mi = l. 

t=l 

Thus Xw -f- (1 — X)v is also a convex combination of the x*y and the set 
is convex. 
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Example: Figure 6-10 shows the set of all convex combinations of the 
points x lf . . . , x 7 . In E 2 we find the set of all convex combinations of m 
points by connecting all the points with straight lines. The resulting 
polygon and its interior is the desired set. 

6-5 The convex hull. Given any set A which is not convex, it is possible 
to imbed A in another set X which is convex such that every point in A 
is also in X* We often wish to find the “smallest” convex set containing 
A. This smallest convex set which contains A is called the convex hull of A . 

Examples: (1) The convex hull of the set A — {[a^, x 2 ]\x\ + x\ = 1} 
is X = {[si, x 2 ]\x\ + x\ < lj. The convex hull of the points on the 
circumference of a circle is the circumference plus the interior of the 
circle. This is the smallest convex set containing the circumference. 

(2) The convex hull of two points Xi, x 2 is the set of all convex combina¬ 
tions of these points X = {x|x = Xx 2 + (1 — X)xi, all X, 0 < X < 1}. 
This is the smallest convex set containing x if x 2 . 

So far we have not defined clearly the meaning of the term “smallest” 
in E n . To avoid intuitive interpretations, mathematicians define the con¬ 
vex hull as follows: 

Convex hull: The convex hull of a set A is the intersection of all convex 

sets which contain A . 

The intersection of all convex sets containing A must be the smallest 
convex set which contains A. Hence, our definition is merely a more 
rigorous and elegant formulation of the intuitive idea of “smallest.” It 
should be recalled that the intersection of convex sets is also convex. 


* Note that E n is a convex set and hence X always exists. 
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The convex hull of a finite number of 'points xi,. . . , x m is the set of all 
convex combinations of x x , ... } x m . This theorem states that the convex 
hull of Xi, . . . , x m is the set 

( m m | 

x|x = ^2 MXi, all Hi > 0, ^2 Mi = l} • (6-39) 

»=i i=i ' 

In the preceding section we have shown that X is convex. The proof will 
be made by induction on the number of points, that is, on m. We must 
show that every convex set containing x x , . . . , x m also contains X. 
Clearly the theorem is true for m = 1 since there is only one point and 
pi — 1. We have already defined a set with one point as convex. The 
theorem is equally obvious for m = 2, but we need not make use of this 
directly. Now let us suppose that the theorem is true for m — 1, that is, 
the convex hull of x x , . . . , x m _i is the set 

{ m—1 m—1 \ 

x[x = ^2 PiXi, all > 0, ^2 = M * (6-40) 

i=l i= 1 ' 

Then, consider the convex hull X of Xi, . . ., x m . Obviously, x m must be 
an element of the convex hull. Similarly, every point in Xi must be an 
element of X because Xi is by assumption the convex hull of Xi, . . ., x m _ x . 
In addition, X must contain all points on the line segments joining points 
in Xi to x m , that is, all points 

m—1 

x = X J2 + 0 - x )Xm, 0 < X < 1. (6-41) 

1=1 
If 

Pi == \fiij i — 1, . . . , m 1, pm (1 ? 

then all Pi > 0 and 

m m—1 

£ Pi — X] + (1 — “ X ! (1 — X) = 1. 

i=l i=l 

Furthermore, since each & and X can vary between 0 and 1, each pi can 
assume any value between 0 and 1, the only restriction being Y,Pi = 1. 
Hence, the set defined by (6-41) is the set of all convex combinations of 
x x , . . . , x m and represents the convex hull of x u ... } x m if X x is the 
convex hull of x x , . . . , x m _ x since it is the smallest convex set containing 
Xi and x m . By induction, the convex hull of m points is the set of all 
convex combinations of the m points. 

Note that the convex hull of m points is also a closed set. In Prob¬ 
lem 6-31, the reader will be asked to provide the argument in detail. 

We shall find it convenient to denote convex hulls of m points by a 
special name. The following definition is a direct generalization from E 2 , 
E 3 , where the convex hull of m points is a polyhedron. 
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Convex polyhedron: The convex hull of a finite number of points is 
called the convex polyhedron spanned by these points. 

It is obvious that the convex polyhedron spanned by m points cannot 
have more than m extreme points since it is the set of all convex combina¬ 
tions of the m points. All elements of the set, except x x , . . . , x m> lie 
between other points of the set. Thus only Xi, . . . , x m can be extreme 
points. Each point Xj, . .. , x m will not necessarily be an extreme point. 
One or more of these points may be interior points of the convex poly¬ 
hedron (see Fig. 6-10). 

This discussion suggests that any point in a convex polyhedron can be 
represented as a convex combination of the extreme points of the poly¬ 
hedron, i.e., any x can be written 

x = M. > o, Lm i = 1, (6-42) 

where the x* are extreme points. We shall prove this statement later. 
However, not every convex set with a finite number of extreme points 
has the property that any point in the set can be represented as a convex 
combination of the extreme points. For example: It is not true that any 
point in the convex set shown in Fig. 6-11 can be represented as a convex 
combination of the extreme points 1, 2, 3. Intuitively, we can see the 
reason for this: The set is unbounded. We shall prove in one of the follow¬ 
ing sections that any strictly bounded closed convex set with a finite 
number of extreme points is the convex hull of the extreme points. 

Simplex: The convex hull of any set of n + 1 points from E n which do 
not lie on a hyperplane in E n is called a simplex. 

A simplex is a special case of a convex polyhedron. Since the n + 1 
points do not lie on a hyperplane, n points must be linearly independent, 
for otherwise all the points would lie on a hyperplane passing through 
the origin. In E 2 a triangle and its interior form a simplex. The three 
points which generate the simplex are the vertices of the triangle. 
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*6-6 Theorems on separating hyperplanes. In this and the follow¬ 
ing two sections, four important theorems are proved which are of sig¬ 
nificance for a wide range of problems in linear programming, decision 
theory, and game theory. 

Theorem I: Given any closed convex set X, a point y either belongs to 
the set X or there exists a hyperplane which contains y such that all of X 
is contained in one open half-space produced by that hyperplane. 

This is the theorem of the separating hyperplanes. Before considering 
the proof, note that the two parts of the theorem are mutually exclusive. 
If y belongs to the hyperplane, then it does not belong to the open half¬ 
space. 

We shall begin proving the theorem by assuming that y does not be¬ 
long to X. We then find the point w in X such that for all u in X 

|w — y| == min |u — y|. (6-43) 

The point w is the point in X closest to y ("closest” means "shortest dis¬ 
tance”). Since the set is closed, we know that the minimum distance is 
actually assumed! for some w. There can be only one such point w, 
for if there were two, the point halfway between them would be in X 
and also closer! to y. 

Next, let us consider any u G X. Then the point 
(1 — X)w + Xu, 0 < X < 1, 


* The results of Sections 6-6, 6-7, and 6-8 are important. The theorems are 
geometrically quite obvious, but the proofs are rather tedious. It is sufficient 
to read the theorems and to study their geometrical interpretation. If desired, 
the material in the starred sections may be omitted entirely without loss of 
continuity. 

f We have not proved that the distance will actually assume its minimum 
value for a point in the set. This proof (which exceeds the scope of this text) 
could be based on the theorem of Weierstrass, which states that a continuous 
function defined over a closed bounded set (X need not be bounded, but only a 
part of X "near” to y must be considered) actually takes on its minimum at some 
point in the set. Intuitively, however, the result is fairly obvious. 

t Suppose that there are two points wi, W 2 both of which have the same 
minimum distance from y. Then, by the triangle inequality, we obtain 

li(wi + w 2 ) — y| = £|(wi — y) + (w 2 — y)| < £(|wi — y| + |w 2 — y|). 

The strict inequality holds if wi — y ^ X(w 2 — y) ; this is the case here, since 
|wi — y| = |w 2 — y|, and wj ^ w 2 . Therefore, 

|£(wi + W2 ) — y| < [w! — y|. 
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is in X. But by (6-43) 

|(1 — A)w + Xu — y| 2 > |w — y| 2 , 0 < X < 1, (6-44) 

or 

|(w — y) + X(u — w)| 2 > |w — y| 2 . 

Expanding the preceding expression, we arrive at 

|w — y| 2 + 2X(w — y)'(u — w) + X 2 |u — w| 2 > |w — y| 2 , 
or 

2X(w — y)'(u — w) + X 2 |u — w| 2 > 0. (6-45) 

Take X > 0. Dividing (6-45) by X yields 

2(w — y)'(u — w) + X|u — wj 2 > 0. 

Let X tend to zero. In the limit we have 

(w — y)'(u — w) > 0; (6-46) 

however, 

U — w = u — y — (w — y). (6-47) 

Substituting (6-47) into (6-46), we have 

(w - y)'(u — y) > |w — y| 2 . 

But |w — y| 2 > 0 because w e X and y £ X. Therefore, 

(w - y)'(u — y) > 0, (6-48) 

or 

(w — y)'u > (w — y)'y. (6-49) 

Define 

c = (w — y)', z = (w — y)'y = cy. (6-50) 

Consider the hyperplane cx = z. Since cy — z, y is on the hyperplane. 
However, according to (6-49), any u e X satisfies 


cu > z. (6-51) 

Thus any point in X is in the half-space* cx > z. The theorem is proved. 

The geometrical interpretation of the theorem in E 2 , E s is very simple. 
Examination of Fig. 6-12 shows that when y is not in X, we take for the 


* The direction of the inequality in cx > z is immaterial. If both sides 
are multiplied by (—1), we obtain (— c)x < — z. The hyperplane (— c)x = —z 
also contains y. 
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hyperplane of the theorem the line through y perpendicular to the line 
representing the shortest distance from y to X. Thus for E 2 and E 3 , the 
validity of the theorem is obvious. For E n we have proved it rigorously 
by algebraic methods. 

Theorem I states that for any y £ X (regardless of how close y is to 
the set X), there is a hyperplane cx = z containing y such that cu > z 
for all u e X. This fact suggests that if w is any boundary point of X, 
there is a hyperplane cx = z containing w such that cu > z for all u e X. 
This is true, as we shall prove shortly. The hyperplane containing the 
boundary point is called a supporting hyperplane to the convex set X. 

Supporting hyperplane: Given a boundary point w of a convex set X; 
then cx = z is called a supporting hyperplane at w if cw = z and if 
all of X lies in one closed half-space produced by the hyperplane, that is, 
cu > z for all u E X or cu < z for all u G X. 

Theorem II: If w is a boundary point of a closed convex set, then there 
is at least one supporting hyperplane at w. 

Theorem I did not actually show that the point w closest to y was a 
boundary point of X. Clearly, this has to be the case; otherwise there 
would exist an s e X on the line joining w, y, with 

s = Xw + (1 — X)y, 0 < X < 1, 

such that 

|s - y| = X|w - y| < |w - y|, 

and w would not be closest to y. Thus w is a boundary point of X . 

Using (6-50) in (6-46), we have 

c(u — w) > 0, 
or 

cu > cw. (6-52) 
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If we consider the hyperplane 

cx = cw = z y (6-53) 

then w is on this hyperplane, and by (6-52), cu > z for all uEl. Thus, 
boundary point w has a supporting hyperplane. This fact, however, does 
not prove Theorem II since in Theorem I, w was not an arbitrary boundary 
point but was determined by the choice of y. Theorem II could be proved 
if, for any given boundary point w, we could find a point y £ X such 
that w was the closest point in the set to y. It is clear how to do this geo¬ 
metrically in E 2 and E 3 . 

We construct a normal to the set at w, and any point on this normal, 
but outside the set, will satisfy the condition. Such an approach cannot 
be easily generalized to E n (in fact it is difficult to formalize in E 2 and E 3 ) 
and hence we shall proceed in a slightly different way. 

Select any boundary point w of X. Consider an c neighborhood about w. 
For any e > 0, however small, there are points inside the hypersphere which 
are not in X. Choose a given 6 (e*) and select from the neighborhood a 
y k which is not in X. Then a boundary point of X will be the closest 
point in X to y*. We have shown that there is a supporting hyperplane 
CfcX = c*Wfc — Zk at Wfc. Next, we choose a sequence of e k such that 
€je —> 0 as A) —* oo. By the triangle inequality, 

|w* — w| = |w* — y k + Yk — w| < |w* — y*| + |y* — w|. (6-54) 

The choice of Yk requires that |y^ — w| < e k , and therefore |y* — w| — ► 0 
as k —> oo. Since |w* — y*| is the shortest distance between y*> and X, 
|w* — yjk| < \Yk — w|, and hence |y* — w A | —> 0 as k —► oo. We con¬ 
clude that 

|w* — w| —» 0 as k —* oo, or w* —> w. 

For each w* there is a supporting hyperplane c*x = z^. Dividing by 
|c&|, we obtain 

n*x = b k y n* = c*/|cjfc|, b k = z k /\c k \y (6-55) 

and |njfc| — 1. By the Schwarz inequality (Section 2-6), 

N < |n*||w*| = |w fc |, (6-56) 

because w fc is on the hyperplane (6-55). For sufficiently large k there 
exists an r > 0 such that |w*| < r independent of k. This follows since 

|Wjk| = [Wjfc — w + w| < |w| + |w* — w|, 
and |Wfc — w| —> 0. An r which satisfies the above requirement is 
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r — 2|w|. Equation (6-56) can then be written in the form of 

-r < — |w*| < b k < |wjfc| < r. (6-57) 

Hence the b k form a bounded infinite sequence. Similarly, the components 
of n k form bounded infinite sequences because |n*| = 1. One theorem on 
sequences states that bounded infinite sequences possess at least one limit 
point . Hence, there is a subsequence of points y k for which n* —> n, and 
b k —» b as w* —> w. Furthermore, for every k , — b k = 0, and in 

the limit, nw — b. Thus we have shown that for w there is a hyperplane 
nx = b for which 


nw = b, nu > 6, all ugI; 


and this is a supporting hyperplane. Theorem II has been proved. 



Unfortunately, in proving the theorem, we had to use the ideas of limits 
and bounded sequences. We have not discussed these subjects, which 
may be unfamiliar to the reader. However, these concepts had to be in¬ 
troduced since we are not sure that n*, b k approach unique values when 
we chose the sequence of points y k in some specific way. For arbitrary 
sequences y k , it is not always true that n*, b k approach unique values. 
The theorem on bounded sequences assures us that there is a sequence 
of y k for which n* and b k do approach unique limits. However, the sup¬ 
porting hyperplane at w need not be unique. Figure 6-13 illustrates that 
any one of the hyperplanes 1, 2, 3 is a supporting hyperplane at w. Hence, 
an arbitrary sequence of points y k could not be used to obtain unique 
values. 

It should be clear to the reader that Theorem II holds even if X is not 
closed, i.e., there is at least one supporting hyperplane at a boundary point 
w, regardless of whether or not w e X. 
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*6-7 A basic result in linear programming. We have already shown that 
the set of feasible solutions to a linear programming problem is a closed 
convex set, and that the function to be optimized is a hyperplane. This 
hyperplane is moved parallel to itself over the convex set of feasible 
solutions until z is made as large as possible (if z is being maximized) 
while still having at least one point x on the hyperplane in the convex set 
of feasible solutions. 

Hence, if a given hyperplane corresponds to the optimal value of z, 
then no point on the hyperplane can be an interior point of the convex 
setf of feasible solutions X. To see this, let us suppose that z = cx is an 
optimal hyperplane and that one of its points x 0 is an interior point of the 
set. We select an e > 0 such that every point in this € neighborhood of x 0 is 
inX. The point Xx = x 0 + (c/2) (c'/|c|) isinX, andcxx = z + (c/2)|c|> z. 
This contradicts the fact that z is the maximum value. Hence every 
point of X on the optimal hyperplane must be a boundary point. There¬ 
fore, if x 0 is an optimal solution to the linear programming problem, 
then z = cx 0 and cu < z for all u G X (we assume, of course, that z is 
being maximized). Thus, an optimal hyperplane is a supporting hyper¬ 
plane to the convex set of feasible solutions at an optimal solution x 0 . 

Theorem III: A closed convex set which is hounded from below has ex¬ 
treme points in every supporting hyper plane. 

The convex set of feasible solutions to a linear programming problem is 
closed and bounded from below by 0 because Xj > 0 for all j. Hence, 
the theorem states that if there is an optimal solution, at least one of the 
extreme points of the convex set of feasible solutions will be an optimal 
solution. In E n , as in E 2 , E 3 , the convex set of feasible solutions will have 
only a finite number of extreme points. J Hence, if we had the means of 
selecting the extreme points of the convex set of feasible solutions, only a 
finite number of points would have to be examined to find an optimal 
solution to the problem. And indeed, it is possible to determine analyti¬ 
cally the extreme points. This is the basis of the simplex method. We 
move from one extreme point to a new one (having a value of z at least as 
large as the preceding one) until an optimal solution is found. 

We shall now prove Theorem III. The hyperplane cx = z will be 
assumed to be a supporting hyperplane at x 0 to the closed convex set X 
which is bounded from below. The intersection of X and the set 


t It may turn out that there is no maximum value of z, that is, z can be made 
arbitrarily large. Then this result does not hold. However, when we speak of 
an optimal solution to a linear programming problem, we shall imply that z 
has a finite maximum or finite minimum. 

X In Problem 6-33 the reader will be asked to prove the theorem for E n . 
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S = {x|cx = z\ will be denoted by T. The intersection is not empty 
because x 0 G T\ furthermore, since X and S are closed convex sets, so 
is T; T is also bounded from below since X is. 

We shall show that any extreme point of T is also an extreme point of X . 
If t is any point in T, and if 

t = Xx 2 + (1 — X)x*, 0 < X < 1, 

where Xi, x 2 G X, then Xj, x 2 G T. This follows from the fact that 

ct = Xcx 2 + (1 — X)cxi = z, (6-58) 

and cx 2 > z , cx x > z because cx = z is a supporting hyperplane. Noting 
that X, (1 — X) > 0, we see that (6-58) will hold if and only if cx 2 = z , 
cxi = z, that is, if and only if Xi, x 2 E T, Thus an extreme point of T 
cannot be represented as a convex combination of any two points in X 
with 0 < X < 1. Hence an extreme point of T is an extreme point of X. 

We still have to prove that T actually has an extreme point; this will 
be accomplished by finding an extreme point. Out of all the points in T, 
choose the one with the smallest (algebraic) first component. There is at 
least one such point since T is closed and bounded from below. 

If there is more than one point with a smallest first component, choose 
the point or points with the smallest first and second components. If 
again there is more than one point with the smallest first and second com¬ 
ponents, find the point or points with the smallest first, second, and third 
components, etc. Finally, a unique point will be obtained since only one 
point can have all its components of minimum algebraic value. 

The unique point t* determined by the above process is an extreme 
point. If t* were not an extreme point, we could write 

t* = Xti + (1 - X)t 2 , 0 < X < 1; ti 5 ^ t 2 G T. (6-59) 

Suppose the unique t* was determined on minimizing the jth component. 
If tji, tj 2 are the jth components of ti, t 2 , then thejth component of (6-59) 
is 

% = \tj i + (I - X)^ 2 , 0 < X < 1. (6-60) 

Furthermore (why?) 

ti == t %i — 1{2 a = i j i)* 

But then (6-60) requires that t* = tj\ = tj 2 , for otherwise t* > 
min [tj lf tj 2 ]. However, this result contradicts the fact that there is only 
one point with this t* when all components 1, . . . , j — 1 are at their 
minimum values. Consequently, t* cannot be represented as a convex 
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combination of any two other points in T (0 < X < 1). Hence t* is an 
extreme point and the theorem is proved, f The above proof also demon¬ 
strates that a strictly bounded convex set has extreme points in every 
supporting hyperplane. 


*6-8 Convex hull of extreme points. 

Theorem IV: If a closed , strictly bounded convex set X has a finite num¬ 
ber of extreme points , any point in the set can be written as a convex com¬ 
bination of the extreme points , that is, set X is the convex hull of its extreme 
points. 

We were led to suspect in Section 6-5 that a result of this sort might 
be true. We are now able to prove it. 

Take the extreme points of X to be yi, . . ., y m ; S will be defined as 
the convex hull of the extreme points, 

s = {y|y = £ M.y.-, ail m > o, £ m» = 1 } • (6-6i) 

Suppose that there is a point v 0 G X and v 0 £ S. Then by Theorem I 
(on separating hyperplanes) there is a hyperplane cx = z containing v 0 
such that cy > z for all y e S. In addition, there is a w e $ which is 
closest to Vo (*S is closed). Furthermore, we can write c' = w — Vq. 
Consider a set of points x* not necessarily in X such that 


Then 


x * = V 0 — X*(w - v 0 ); \k > o. (6-62) 

|w - x*| = (1 + X*)|w - v 0 1 > |w - V 0 |. 


The hyperplane with normal c which passes through x* may be written 
cx = Zk , and 

cx = CXfc = CVo X*|c| 2 = Z — Xfc|c| 2 = Zjc < z. 


For any y G S, cy > z, and hence cy > z *. Increase \ k to the largest 
possible value for which the hyperplane cx = z k will contain a point of X. 
There will be a largest X* since X is closed and strictly bounded. Call z* 
the z k for this hyperplane. Hence a v* e X exists such that cv* — z*, 
and cv > z* for all v e X. It follows that cx = z* is a supporting hyper¬ 
plane to X. However, it contains no extreme points of X, since they are 
all in S, and every point y G S satisfies cy > z*. This contradicts the 


f The same line of reasoning will reveal that if a linear programming problem 
has a feasible solution, at least one feasible solution will be a boundary point, 
and there will be at least one extreme point (see Problem 6-23). 
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Figure 6-14 Figure 6-15 


fact that there are extreme points in every supporting hyperplane. Hence 
every point of X must be in S , and X is the convex hull of its extreme 
points. 

Figure 6-14 interprets our procedure geometrically. We assume that 
the points yi, y 2 , y 3 are the only extreme points of the convex set X. 
They generate the convex hull S. Point v 0 is not in S . A hyperplane (a) 
is passed through v 0 so that all of S lies in one half-space produced by the 
hyperplane. Then the hyperplane is moved parallel to itself until ( b ) 
is reached; if it were moved any further, no point of X would be on the 
hyperplane. Hence (6) is a supporting hyperplane which presumably 
contains no extreme points of X. This is obviously not true (see Fig. 6-14). 
The contradiction is, of course, a result of our forgetfulness: We did not 
count extreme point y 4 in our original determination of the extreme points 
of X. 

Example: Suppose that we wish to write any point w inside a triangle 
as a convex combination of the vertices (extreme points): 

3 

w = ^ Mi — 0, ^ ] Mx = 1* 

t=i 

The situation is illustrated in Fig. 6-15: First, draw a line from x 2 through 
w. It will intersect the opposite side of the triangle at v. Then 

w = \jx 2 + (1 - Xi)v, 0 < Xi < 1; 

but 

v = X 2 x x + (1 — X 2 )x 3 , 0 < X 2 < 1; 


thus 


w = XjX 2 + (1 — Xi)X 2 Xi + (1 — Xi)(l — X 2 )x 3 . 
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Let Mi — X 2 (l ^l)r M 2 — Xi, M 3 — (1 X 1 ) (1 — X 2 ). 

Clearly, m > 0 and 

Mi + M 2 + M3 = X 2 (l ~ Xi) + Xi + (1 — Xi) — (1 — Xi)X 2 = 1. 

The desired expression for w is w = £jLi PiX{. 

From the definition of a convex polyhedron and from Theorem IV it 
follows that every closed, strictly bounded convex set with a finite number 
of extreme points is a convex polyhedron. 

Convex sets with only a finite number of extreme points are essentially 
convex polyhedrons. However, they may be of the type shown in Fig. 6-11, 
that is, they may not be strictly bounded. For convex sets with a finite 
number of extreme points, it is useful to introduce the concepts of an edge 
and of adjacent extreme points. 

Edge: Let x*, x* be distinct extreme points of the convex set X. The line 
segment joining them is called an edge of the convex set if it is the inter¬ 
section of X with a supporting hyperplane. If x* is an extreme point of X , 
and if there exists another point x in X such that x = x* + X(x — x*) is 
in X for every X > 0, and if } in addition , the set L= {x|x = x* + X(f — x*), 
all X > 0} is the intersection of X with a supporting hyperplane , then 
the set L is said to be an edge of X which extends to infinity. 

Adjacent extreme points: Two distinct extreme points x*, x| of the 
convex set X are called adjacent if the line segment joining them is an edge 
of the convex set. 

These definitions conform to our conception of an edge and of adjacent 
extreme points in E 2 , E s . In Fig. 6-13, the line segment joining the 
extreme points w, Xx is the intersection of X with hyperplane 1, and hence 
is an edge of the set. Since the line joining them is an edge of the set, 
w, Xx are adjacent extreme points. 

6-9 Introduction to convex cones. In the preceding sections, we have 
examined some of the properties of a special class of convex sets, the 
convex polyhedrons. Now another class of convex sets, i.e., convex cones, 
will be studied which are useful in the theory of linear programming 
and in linear economic models. 

Cone: A cone C is a set of points with the following property: If x is in 
the set , so is px for all p > 0. 

The cone “generated” by a set of points X = {x} is the set 
C — {y|y = /xx, all p > 0 and all x e X] . 


(6-63) 
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Note that a cone is never a strictly bounded set (except in the trivial case 
where 0 is the only element in the cone). However, a cone may be bounded 
from above or from below. In E 2 and E 3 , a cone as "a set of points” is 
often identical with the usual geometrical concept of a cone. 

Vertex: The point 0 is an element of any cone and is called the vertex 
of the cone. 

Example: Figure 6-16 shows a cone in E 3 generated by the set of 
points 

X = \[x u x 2 , X 3 ]\x\ + x\ < 1, X 3 = 1}. 

Negative: The negative C~ of a cone C = {u} is the set of points C~ = 
{-«}• 

C~ is clearly a cone if C is. 

Sum: The sum of two cones C x = {u}, C 2 = {v}, written C x + C 2 , 
is the set of all points u + v, UE^VG C 2 . 

The sum C x + C 2 is a cone because p(n + v) = p\x + pv, and pu G Ci 
if u E C i, pv e C 2 if v G C 2 , so that p{ u + v) E C x + C 2 for all p > 0. 

Polar cone: If C = {uj is a cone, then C + , the cone polar to C, is the 
collection of points { v } such that v'u > 0 for each v in the set and all 
u EC. 

Obviously, C + is a cone since if v'u > 0 for all u E C, then pv' u > 0 
for all p > 0. Intuitively, a polar cone is the collection of all vectors 
which form a nonobtuse angle with all the vectors in C. Note that each 
v E C + must form a nonob tuse angle with every vector in C. 


x 3 





Figure 6-16 


Figure 6-17 
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Examples: ( 1 ) Figure 6-17 shows the given cones C x , C 2 as well as 
C 3 = C\ -f- C 2 and C\ . 

( 2 ) Figure 6-18 shows C + , the cone polar to the given cone C. 

We can easily prove that for any cones C x , C 2 , 

(C x + C 2 ) + = Cf n ct , (6-64) 

and if C\ CC 2y then C C+. The details of the proofs are to be sup¬ 
plied in Problems 6-38 and 6 - 39 . 

Convex cone: A cone is a convex cone if it is a convex set. 

All the cones in our examples have been convex cones. The cone in 
Fig. 6-19 is not convex since it consists of two separate parts. 

A set of points is a convex cone if and only if the sum v x + v 2 is in the 
set when Vi, v 2 are , and if pv is in the set when v is for any p > 0. To 
see that the conditions are necessary, note that if C is a convex cone, then 
M v (m > 0) G C if v G C by the definition of a cone. To see that v x + v 2 
must be in C if Vi, v 2 G C, note that since C is a cone, we can write 

Vi = Xu>i, 0 < X < 1, wi G C, v 2 = (1 — X)o> 2 , «2 £ C. 

However, because C is convex, X«i + (1 — X)w 2 G C, and thus v x + v 2 
must be in C. To see that the conditions are sufficient, let us suppose that 
the sum v x + v 2 is in the set if v x , v 2 are, and pv is in the set if v is for 
all p > 0 . The second condition indicates clearly that the set is a cone. 
The cone will be convex if Xa> x + (1 — X)o> 2 , 0 < X < 1, is in the set 
when «!, w 2 are in the set. The definitions v x = Xw x , v 2 = (1 — X)« 2 , 
and the fact that v x + v 2 is in the set, determine that the set is convex 
and therefore a convex cone. 


x 2 




Figure 6-18 


Figure 6-19 
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This set of conditions for convex cones looks very similar to those used 
for defining a subspace of E n . The only difference is that, for convex cones, 
the scalar /x > 0. Because of this restriction, a convex cone is not, in 
general, a subspace of E n f although it can be a subspace in certain cases. 

The sum C x + C 2 of two convex cones is also convex. Let x x , x 2 be any 
two points in Ci + C 2 . By definition of the sum, 

xi = ui + x 2 = u 2 + v 2 , Ui, u 2 e Ci, v 1? v 2 e C 2 . 

Then for any point x which is a convex combination of x x , x 2 , we obtain 

x = Xx x -f- (1 — X)x 2 = Xu x + (1 X)u 2 + Xv x —J— (1 — X)v 2 . 

But Xu x + (1 — X)u 2 E C x , Xv x + (1 — X)v 2 E C 2 . Hence x E C x + C 2 , 
and C\ + C 2 is a convex cone. In Problem 6-40 the reader will be required 
to show that if Ci (i = 1, . . ., m) are convex, ££L X Ci is convex too. 

The cone generated by a convex set is. a convex cone. Given the convex 
set X = {x}, we wish to prove that C = {y|y = /xx, p > 0, all x E X] 
is a convex cone. Clearly, C is a cone. To show that C is convex, we must 
demonstrate that if y x , y 2 E C, then any convex combination of y x , 
y 2 E C. Note that y x = MiX x , Y 2 — M 2 X 2 , x x , x 2 e X. Thus we wish to 
show that any point y, 

y = X/x x x x + (1 — X)/x 2 x 2 E C, 0 < X < 1, 


is in the set. Write 


Hence if 
then 


f = X/x x + (1 — X)/x 2 . 
y = f[ax x + (1 — <*)x 2 ], f 5^0, 


a 



0 < a < 1. 


However, because X is convex, x = ax x + (1 — <*)x 2 E X, and y = 
fx eC. If f = 0, y = 0, and 0 E C. Therefore C is a convex cone. 

The simplest convex cones are generated by a convex set containing 
a single element (a set containing a single element is convex by definition). 
If the single element x 5 ^ 0, then the convex cone is a line segment be¬ 
ginning at the origin, that is, the set of all multiples /xx, /x > 0. Such 
simple cones generated by a single point are called half-lines. 

Half-line: Given a single point a ^ 0, a half-line or ray is defined as 

tke S6t L = {y|y = Ma, all m > 0}. (6-65). 

The symbol L will always refer to a half-line. 
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The polar cone H of a half-line L is a closed half-space containing 0 
on its bounding hyperplane because H is the set 

H — {v|a'v >0}. (6-66) 

This follows because H is the collection of points v with v'y > 0 for all 
y e L. However, any y can be written jua, p > 0 and pv'a. > 0 if v'a > 0. 
Thus H is the collection of points v for which v'a = a'v > 0, which is 
a closed half-space. 

Containing half-space: Given a cone C. Let a 0 bean element of C + . 
Then the set 

H 8 = {x|a'x > 0} (6-67) 


is a containing half-space for C. 

Any containing half-space includes all of cone C. The boundary hyper¬ 
plane P 8 of any containing half-space passes through the origin: 

P 8 = {x|a'x = 0). (6-68) 


Example: A cone C and a typical containing half-space are illustrated 
in Fig. 6-20. 

Orthogonal cone: Given a cone C in E n . The cone C L which is the set 
of all vectors u in E n , such that each u is orthogonal to every vector v G C, 
that is, the set 

C L = {u|u'v = 0, all v E C\ (6-69) 


is called the orthogonal cone to C. 


*2 



*3 



Figure 6-20 


Figure 6-21 
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C 1 is a cone since if u'v = 0, then /xu'v = 0 for all p > 0. C x is not only 
a cone, it is also a subspace of E n . This follows because if u E C x , u'v = 0 
and jjlvl'v = 0 for all /x; therefore pu is in C L for any p. Furthermore, if 
Ui, u 2 G C- 1 , then Ui + u 2 G C L since ujv = 0 and u£v = 0 imply that 
(ui + u 2 )'v = 0. In some cases, the set C x will contain only the element 0. 
For example in E 3 , u = 0 is the only vector for which u'v = 0 when the 
v are the elements of the cone shown in Fig. 6-16. 

Example: In E 3 (see Fig. 6 - 21 ), cone C L (orthogonal to C) is all of the 
x 3 -axis. 

Given any half-line L, the orthogonal cone P to this half-line is a hyper¬ 
plane. If a ^ 0 generates the half-line, P is the set 

P = {x|a'x = 0 }, (6-70) 

which is a hyperplane through the origin. 

Dimension of a cone: The dimension of a cone C is defined as the max¬ 
imum number of linearly independent vectors in C. 

The dimension of C is the dimension of the “smallest subspace of E n ” 
which contains C, that is, the dimension of the intersection of all subspaces 
containing C. 

It is easy to prove that the smallest subspace containing a convex cone 
C is C + The details of the proof will have to be supplied in Prob¬ 
lem 6-41. 

Example: The dimension of the cone (Fig. 6-22) is 2 because there 
are two linearly independent vectors in the cone. Thus the smallest 
subspace of E 2 containing the cone is E 2 itself. 

6-10 Convex polyhedral cones. 

Convex polyhedral cone: A convex polyhedral cone C is the sum of a 

finite number of half-lines , 

c = E Li. (6-71) 

1=1 

In this definition the term “sum” is used in the sense of sums of cones. 
The cone C defined by (6-71) is convex because the sum of convex cones 
is a convex cone and half-lines are convex cones. 

If the point a; 5 ^ 0 generates the half-line Li, then from (6-71) C is the 
collection of points 

r 

y = L /*»«*-, all Mi > 0 , i = 1 , , r. (6-72) 

t=l 
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Example: Figure 6-23 shows the convex polyhedral cone generated 
by the half-lines 1, 2, 3. Note that any cross section of the polyhedral cone 
is a convex polyhedron. 

The cone C generated by a convex polyhedron is a convex polyhedral cone . 
Let the set X be a convex polyhedron. Then any point x in X can be 
written as a convex combination of the extreme points x* (assumed 
to be r in number): 

r 

X = ^ ^ Mt'^i j Pi ^ 6, ^ ] Pi — L 

t=l 

Cone C is the collection of points ax, all a > 0 and all x£l. However, 
ax = ^ p.i 0 ix*i = = X* > 0. (6-73) 

t=i »=i 

Each extreme point x* generates a half-line Li ) we see by (6-72) and (6-73) 
that r 

c=z L <-> 

i=l 


thus C is a convex polyhedral cone. 

If A is an n X r matrix A = (ai,. . ., a r ), then the set of points 

r 

y = Ax = y x,a,-, all x > 0, (6-74) 

«=i 

is a convex polyhedral cone in E n . This follows immediately from (6-72). 
The columns a» of A generate the half-lines whose sum yields the poly¬ 
hedral cone. The fact that C = {y}, with y given by (6-74), is a poly- 




Figure 6-22 


Figure 6-23 
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hedral cone indicates that there is a non-negative solution x > 0 to the 
set of simultaneous linear equations 

Ax = b 

if and only if b is an element of the convex polyhedral cone generated by 
the columns of A. 

Any given finite number of points a x , . . . , a r from E n can be thought of 
as generating a convex polyhedral cone C. Each a* generates a half-line, 
and the cone C is the sum of these r half-lines. Similarly, we can imagine 
that any convex polyhedral cone in E n has been generated by a finite 
number of points a x , . . ., a r from E n . We only need to choose one non¬ 
zero point on each half-line whose sum yields the cone. If r > n, and 
if in addition there are n points in the set ai, . . . , a r which are linearly 
independent, then the points a x , . . ., a r generate a cone of dimension n, 
that is, no subspace of E n containing C has dimension < n. The cone 
shown in Fig. 6-23, for example, has dimension 3, the dimension of E 3 ; 
it is generated by three linearly independent points. 

Suppose that we have in E n an n-dimensional cone C generated by 
a x , . . . , a r . Out of the set a x , . . . , a r let us choose any n — 1 linearly 
independent points b x , . . . , b n _ x . These points determine a unique hyper¬ 
plane cx — 0 through the origin in E n because cb x = 0, . . . , cb n _ x = 0 
form a set of n — 1 homogeneous linear equations in the n unknowns, the 
Ci. Furthermore, since b x , . . . , b n _ x are linearly independent, there is a 
nonvanishing determinant of order n — 1 in the matrix (b x , . . . , b n _ x ); 
hence the c t - are determined up to a common multiplicative constant. 
Thus the hyperplane is uniquely determined. It may or may not be true 
that all of the cone C will lie in one of the closed half-spaces cx > 0 or 
cx < 0. If C does lie in one closed half-space produced by cx = 0, we 
make the following definitions: 

Extreme supporting half-space: The set of points from E n , 

H F = {v|cv >0), (6-75) 

is an extreme supporting half-space for the n-dimensional convex poly¬ 
hedral cone C generated by the points a x , . . . , a r if C lies in the half-space 
Hf and n — 1 linearly independent points from the set a x , . . . , a r lie an 
the hyperplane cv — 0. 

Extreme supporting hyperplane: The hyperplane cv — 0 which forms 
the boundary of the extreme supporting half-spac'e is called an extreme 
supporting hyperplane for the convex polyhedral cane C. 

The reader should be careful to note the difference between an extreme 
supporting hyperplane for a convex polyhedral cone and a supporting 
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hyperplane for any convex set as defined earlier. An extreme supporting 
hyperplane must have n — 1 linearly independent points of the cone lying 
on it, while a supporting hyperplane need not have more than a single 
point in common with the cone. 

It is clear that the intersection F of an extreme supporting hyperplane 
cv = 0 and the polyhedral cone C yield a collection of points which are 
boundary points of C; that is, every point of C on the extreme supporting 
hyperplane cv = 0 is a boundary point of C since C lies in the half-space 
cv > 0. Furthermore, the intersection F is itself a convex polyhedral 
cone generated by the points from ai, . . . , a r which lie on the hyper¬ 
plane cv = 0. This proof is trivial, and the details will have to be worked 
out in Problem 6-43. Since the set F is a subset of C and a convex poly¬ 
hedral cone, we call F a subcone of C. The subcone F lies in the hyper¬ 
plane cv — 0. Since there are precisely n — 1 linearly independent points 
in F, F has dimension n — 1. In Fig. 6-23 any two of the vectors a x , a 2 , 
a 3 uniquely determine a plane through the origin such that the cone C 
is contained in one half-space produced by the plane. Furthermore, any 
two points from ai, a 2 , a 3 are linearly independent, and hence the resulting 
hyperplane is an extreme supporting hyperplane. The intersection of C 
with any one of the three extreme supporting hyperplanes yields a face 
of the cone C. This face is a cone—the cone F discussed above. In gen¬ 
eral , we call the (n — 1)-dimensional convex polyhedral cone F which is the 
intersection of an n-dimensional polyhedral cone C in E n with an extreme 
supporting hyperplane a facet or face of the cone C. 

An n-dimensional convex polyhedral cone in E n generated by ai, . . . , a r 
can have only a finite number of extreme supporting hyperplanes since 
there are at most r!/(n — 1) !(r — n + 1)1 sets of n — 1 linearly inde¬ 
pendent points in the set aj, . . . , a r . Not every n-dimensional convex 
polyhedral cone in E n needs to have an extreme supporting hyperplane. 
The convex polyhedral cone may be all of E n and thus cannot lie in any 
half-space of E n . For example, in E 2 the convex polyhedral cone generated 
by the points (0, 1), (1, 0), (0, —1), (—1, 0) is all of E 2 and hence does 
not have an extreme supporting half-space. An n-dimensional convex poly¬ 
hedral cone which contains every point of E n is called solid . 

If an n-dimensional convex polyhedral cone is not all of E n , that is, if 
it is not solid, then it does have an extreme supporting hyperplane. 

6-11 Linear transformations of regions. Consider a linear transforma¬ 
tion represented by the m X n matrix A which takes E n into a subspace 
of El” 1 . Frequently, we wish to know how the transformation affects 
some region in E n > that is, what image the region will assume in E m . 

One of the most important properties of a linear transformation is that 
it always takes lines into lines or a line into a point. It also takes hyper- 
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planes in E n into hyperplanes in E m } or a hyperplane in E n into the inter¬ 
section of two or more hyperplanes in E m , or a hyperplane in E n into all 
of E™. We have noted previously that a line in E n is the set of points 

x = Xx x + (1 — X)x 2 , all X, X! ^ x 2 . (6-76) 

The set of points y = Ax for x given by (6-76) is 


y = XAxj + (1 - X)Ax 2 = Xy : + (1 - \)y 2 , (6-77) 

where Yi = Ax u y 2 = Ax 2 . If y x ^ y 2 , we obtain a line in E m . When 

yi — Y 2 , the line in E n is transformed into a point in E m . If A is a non¬ 

singular nth-order matrix, yi will differ from y 2 (when x x ^ x 2 ) so that 
a nonsingular transformation takes a line in E n into another line in E n . 

To determine the effects of a linear transformation on a hyperplane 
cx = 2 in E n , we choose n points x x , . . . , x n on the hyperplane such that 
the vectors x x — x n , . . . , x n _ x — x n are linearly independent. Then the 
hyperplane is the set of all points x for which we can write 

71 — 1 

X — x n = X t(x, — x„). (6-78) 

i=l 


If we set y = Ax, y t = Ax t -, i — 1, . . . , n, then the set of points in E m 
which is the image of the hyperplane can be written 

n —1 

y - y n= x,(y-,y„). (6-79) 

i=l 

If m of the yi — y n are linearly independent, the image of the hyper¬ 
plane is all of E™. When m — 1 of the y t - — y n are linearly independent, 
the image of the hyperplane in E n is a hyperplane in E m . If less than m — 1 
of the yi — Yn are linearly independent, the image of the hyperplane 
in E n lies in the intersection of two or more hyperplanes in E m . In the 
event that A is an nth-order nonsingular matrix, the transformation takes 
a hyperplane in E n into another hyperplane in E n . If x = A“ x y> then 
c x = z becomes cA^y = z. This hyperplane has a normal cA -1 , whereas 
the original hyperplane had a normal c. 

The linear transformation represented by the m X n matrix A takes a 
cone in E n into a cone in E 171 because if C — (x), then when y == Ax, 
it follows that /xy = fiAx = A(jux), /x > 0, is an element of the image 
of C since /xx e C. Thus if »y is in the image of C, so is /xy for /x > 0. 
Hence the image of C is alsola cone. In general, a linear transformation 
takes any convex set into a convex set. (To be proved in Problem 6-18.) 
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x 2 


x l 



Figure 6-24 


2/2 



Example: Let us study what the linear transformation 



does to the rectangular region of the Zia^-plane shown in Fig. 6-24. 
Because the transformation is nonsingular, lines will be taken into lines. 
Hence it is only necessary to examine what happens to the corners of the 
region. Once the images of the corners have been determined, the image 
of the rectangular region is found by joining the images of the corners 
with straight lines. The origin goes into the origin and hence this corner 
remains unchanged. Corner x = [0, 2] becomes y = [1, 2], x = [1, 2] 
becomes y = [2, 2], and x = [1, 0] becomes y = [1, 0]. Thus Fig. 6-25 
shows the image of the rectangular region shown in Fig. 6-24. The region 
has been sheared by the linear transformation. 

All linear transformations of E n into a subspace of E m take the origin 
of E n into the origin of E m since y = AO — 0. This is equivalent to the 
statement that a linear transformation will never translate a region. A 
more general transformation of the form y = Ax + b, b ^ 0, takes the 
origin of E n into the point b of E m . It performs a so-called affine trans¬ 
formation. Affine transformations are not linear. (What is the relation 

in Section 5-7?) 
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Problems 

6 - 1 . Show graphically the regions represented by the following point sets: 

(a) X - {[xi,z 2 ]\x 2 1 + x 2 2 > 1 ); 

(b) X = {[xi,x 2 ]|x?+3xl< 61; 

(c) X = {[xi,x 2 ]\xi > 2, x 2 < 4). 

6 - 2 . Show graphically A D B and A U B in the following cases: 

(a) A = {[x h x 2 ]\xi > 21, B = {[xi, x 2 ]\xi < 3(; 

(b) A — {[xi, x 2 ]|xf + x§ < 4}, B — {[xi, x 2 ]|xf + x 2 < 1 }; 

(c) A = {[xi, x 2 ]\xi > 0, x 2 > 0, xi < 1, x 2 < 1), 

B = {[xi,x 2 ]|(*i - l ) 2 + xf< 1}. 

6-3. Illustrate graphically A\ f| A 2 C I A 3 , A\ U A 2 U A 3 : 

A 1 = {[xi, x 2 ]|zi — x 2 > 0 }, 

A 2 = {[xi, x 2 ]|xix 2 < 1 }, 

A 3 = {[xi, X 2 ]|3xi + 2x 2 > 0}. 

6-4. Show that a line in E n is a closed set. 
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6-5. Give the complements of the following sets and indicate whether the 
complements are open or closed sets (or neither). 

(a) cx < z ; (b) |x — a| < e; 

(c) |x — a| = e; (d) E". 

6-6. Draw the line 6xi + 3^2 = 4 and a vector normal to it. Find the line 
passing through (2, 2) with the same normal. Can this line be obtained from 
6 xi + 3 x 2 = 4 by moving it parallel to itself in the direction of the normal? 

6-7. Express the point (0.5, 1) as a convex combination of ( 2 , 0), (0, §). 

6 - 8 . Let x be a point on the line joining xi and X 2 ; if x is a fraction n of the 
distance from xi to X 2 , write x as a convex combination of xi, X 2 . 

6-9. Given three points xi, X 2 , X 3 in E n , how can we quickly ascertain whether 
they lie on the same line? How can we decide whether X 3 lies between xi, X 2 ? 
Hint: Consider xi — X 2 and xi — X 3 . 

6-10. Given the hyperplane 3xi + 2 x 2 + 4 x 3 + 6 x 4 — 7. In which half¬ 
space is the point x = [ 6 , 1, 7, 2]? 

6-11. Consider the hyperplane cx = 0 in E n . Prove that this hyperplane is 
a subspace of dimension n — 1 . Conversely, prove that any subspace of dimen¬ 
sion n — 1 in E n is a hyperplane through the origin. 

6-12. Prove that E n and any subspace of E n are convex sets, 

6-13. Is a set convex if, given any two points xi, X 2 in the set, the point x = 
£(xi + X 2 ), i.e., the point halfway between these points, is also in the set? 
Can you give a counterexample? 

6-14. Which of the following sets are convex? 

(a) X = {[xi, x 2 ]|3xf + 2x| < 6 ); 

(b) X = {[xi, x 2 ]|xi > 2 , xi < 3}; 

(c) X = {[Xl, X 2 ]|X 1 X 2 < 1, Xi > 0, X 2 > 0}; 

(d) X = {[Xl, X 2 ]|X 2 — 3 > —xf, xi > 0, x 2 > 0}. 

6-15. Prove that if Xi, . . ., X» are convex sets, then X* is a convex 
set. Is UJLi Xi convex? If not, give a counterexample. 

6-16. If Xi, ..., X n in Problem 6-15 are closed, prove that H?=i X, is closed 
also. 

6-17. Sketch the convex polyhedra generated by the following sets of points: 

(a) ( 0 , 0 ), ( 1 , 0 ), ( 0 , 1 ), ( 1 , 1 ); 

(b) (3, 4), (5, 6 ), (0, 0), (2, 2), (1, 0), (2, 5), (4, 7); 

(c) (-1, 2), (3, -4), (4, 4), (0, 0), ( 6 , 5), (7, 1). 

6-18. Prove that a linear transformation of E n into a subspace of E m will 
take a convex set into a convex set. 

6-19. Show that a nonsingular linear transformation takes an extreme point 
of a convex set into an extreme point. Give an example illustrating that a 
singular linear transformation can take an extreme point into an interior point. 
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6-20. Prove that the convex polyhedron generated by a finite number of 
points xi, . . . , x m is strictly bounded if each x» is of finite length. 

6-21. Prove that a closed convex set which is bounded from above has an 
extreme point in every supporting hyperplane. 

6 - 22 . Prove that a strictly bounded, closed convex set has an extreme point 
in every supporting hyperplane by showing that the point in the intersection of 
the hyperplane and the convex set which is farthest removed from the origin is 
an extreme point. 

6-23. Show that any nonempty closed convex set which is bounded from below 
contains at least one extreme point. If a set contains only one point, this point 
will be considered an extreme point. 

6-24. Prove that a strictly bounded, closed convex set with two points in any 
supporting hyperplane has at least two extreme points. Hint: According to 
Problem 6 - 22 , t*, the point in the intersection of the hyperplane and the convex 
set farthest from the origin, was an extreme point. Consider the point in the 
intersection farthest from t*. 

6-25. Consider the intersection of the following half-spaces or hyperplanes: 
a*x{ < = > )&;, i — 1 , . . ., m. 

Find the set of points which is a subset of the above set and includes all its 
boundary points. 

6-26. Show that the boundary of the intersection of m half-spaces is connected. 

6-27. Show that every point in the intersection of m hyperplanes in E n is a 
boundary point of this set and hence that the set is closed. 

6-28. Prove that the intersection of m hyperplanes has no extreme points 
when the variables are unrestricted in sign, unless there is only a single point in 
the intersection. Hint: Let xi, X 2 be any two points in the intersection. Then 
X 3 = xi + X(X 2 — xi) is also in the intersection for any real X. Is Xi ever an 
extreme point? 

6-29. What happens to the theorem of the separating hyperplanes when the 
convex set is open? Is the theorem true if the set is not closed? Hint: Consider 
a set which contains some but not all of its boundary points. 

6-30, Given the convex set X — {[a?i, X 2 ]|xf + x\ < 1). Find the equa¬ 
tion for the supporting hyperplane at any boundary point (£, r}), and the equa¬ 
tion for the supporting hyperplane at [\/ 2 / 2 , \/ 2 / 2 ]. 

6-31. Prove that the convex hull of a finite number of points is a closed set. 

6-32. Consider the triangle with vertices (0,0), (2,0), (1, 1). Express the 
point (0.5, 0.5) as a convex combination of the extreme points. Do the same 
for point (0.3, 0.2). 

6-33. Show that the intersection of a finite number of closed half-spaces 
(and perhaps some hyperplanes) can have only a finite number of extreme 
points in E n . Prove that if the number m of half-spaces is less than n, there is 
no extreme point at all. 

6-34. Prove that the set of solutions to Ax = b is a convex set by showing 
that if xi, X 2 are solutions, so is Xxi + (1 — X)x 2 , 0 < X < 1. Demonstrate 
by the same method that the set of solutions with x > 0 is a convex set. 
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6-35. Prove that a strictly bounded intersection of m closed half-spaces is 
a convex polyhedron. Hint: It is only necessary to show that there are a finite 
number of extreme points. 

6-36. What difficulties would be involved in solving a linear programming 
problem if the constraints were open (i.e., have a “> ” or “< ” sign) rather than 
closed half-spaces. Would the absolute maximum or minimum actually be 
taken on by a point in the set of feasible solutions? 

6-37. Sketch the half-lines generated by the points ai = [2, 1], a 2 = [1, 3], 
a 3 = [—1, 2]. Sketch the cone C which is the sum of these three half-lines. 
Sketch C +, C~ C\ 

6-38. If Ci, C 2 are cones and Ci C C 2 , prove that C% C Cf. 

6-39. If Ci, C 2 are cones, prove that (Ci C 2 ) + = Cf fl CJ. Generalize 

this result and show that 




Hint: If Ui ECi, U 2 G C 2 , then any vector v satisfying v'(ui + 112 ) > 0 must 
also satisfy v'ui > 0, v'u 2 > 0, and hence (C 1 + C 2 ) + C Cf O Cf. 

6-40. Let Ci, i — 1, , . . , m be cones. Show that C = J2?=i Ci is also a cone. 
If the Ci are convex cones, show that C is also convex. 

6-41. Prove that C + C~ is the intersection of all subspaces containing the 
convex cone C, that is, C + C~ is the smallest subspace containing C. Is this 
true if C is not convex? Can you supply a counterexample? 

6-42. Sketch the convex polyhedral cone generated by the points (1, 0, 0), 
(1,1,1), (0,1,0). 

6-43. Prove that the intersection of an n-dimensional convex polyhedral cone 
C (generated by ai, . . ., a r ) in E n with an extreme supporting hyper plane is a 
polyhedral cone F of dimension n — 1. Prove that F is the polyhedral cone 
generated by the points from the set ai, . . ., a r lying on the extreme supporting 
hyperplane. 

6-44. Prove that the intersection of k hyperplanes c*x = 0 in E n (the c* are 
linearly independent) is a subspace of dimension n — k. Show that any sub¬ 
space of dimension n — k can be represented as the intersection of k hyper¬ 
planes. Is this representation unique? Hint: Review Section 5-6. 

6-45. The lineality of a cone C is the dimension of the subspace which is 
the convex hull of all subspaces in C; hence it is the dimension of the smallest 
subspace containing all the subspaces in C. This space is the lineality space of C. 
Prove that if C is convex, the lineality space of C is contained in C. A convex 
cone with zero lineality is called pointed. Illustrate this geometrically. 

6-46. Show that the non-negative orthant of E n is a convex polyhedral cone. 

6-47. Prove: If a convex polyhedral cone C in E n contains no vector x < 0, 
then C+ contains a vector y>0,y^0. Hint: If P is the non-negative orthant, 
then C + P does not contain P~ and therefore C + P is not all of E n . Conse¬ 
quently, there is a vector w 5 * 0 in (C + P)+. But (C + P) + = C+ H P + , 
and P + = P, since ei, . . ., e„ are in P; if w'u > 0 for each u GP, then each 
component of w is non-negative. 
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6-48. Referring to Problem 6-47, consider the hyperplane y'x = 0. Show that 
C is contained in y'x > 0 and P~ in y'x < 0. Thus the hyperplane y'x = 0 
separates C and P~. 

6-49. If A is an n X r matrix, show that the following sentence is a state¬ 
ment of Problem 6-47 in matrix language: If Ax > 0 for all x > 0, then there 
exists w > 0 in E n such that w'A > 0. 

6-50. Given an n X r matrix A, demonstrate that there exists either a vector 
x > 0, ^£=1 Xi = 1, such that Ax < 0, or a vector w > 0, v>i — 1, such 

that w'A > 0. Prove the result by showing that such an x exists if 0 is in the 
convex polyhedron spanned by the columns of A and the unit vectors. If 0 is 
not in the convex polyhedron, demonstrate, by means of the theorem on separat¬ 
ing hyperplanes, that there is a hyperplane through 0 such that each point 
of the convex polyhedron is in one open half-space. Show that w is thus a normal 
to this hyperplane. Can this problem be solved immediately using the results of 
Problem 6-49? (Von Neumann and Morgenstern call this the theorem of 
alternatives for matrices.) 

6-51. Show that the two alternatives mentioned in Problem 6-50 are mutually 
exclusive, that is, vectors x, w cannot both exist. Hint: Assume x and w exist 
and consider w'Ax. 

6-52. Prove that a convex polyhedral cone is a closed set. 

6-53. Interpret the solutions to Ax = b, with A being an m X n matrix, as 
the intersection of m hyperplanes in E n . What is the geometrical interpretation 
of inconsistent equations? What is the geometrical interpretation of redundant 
equations? 

6-54. Consider the convex set of solutions to Ax = b, x > 0, that is, the set 
of non-negative solutions to Ax = b. Show that every extreme point of the con¬ 
vex set is a basic solution to Ax = b with Xb > 0. Also show that every basic 
solution to Ax — b with xb > 0 is an extreme point of the convex set. Hint: 
Let x = [xb, 0] > 0 be a non-negative basic solution to Ax = b. Do there exist 
any other non-negative solutions xi, X 2 such that x = Xxi + (1 — X)x 2 , 
0 < X < 1? Consider the last n — m components of this relation. To show 
that an extreme point is a basic non-negative solution, let x* be an extreme point. 
Now show that the columns of A associated with positive variables are linearly 
independent. To do this assume that the positive variables appear in the first k 
components of x*. Next assume that X^a, = 0 and at least one Xy ^ 0. 
Let X be the n-component vector [Xi, ..., X*, 0, . . . , 0]. Choose an e so small 
that xi = x* + e\ > 0 and X 2 = x* — e\ > 0. How is this done? Then 
x* = i(xi + x 2 ). 

6-55. Show that the general linear programming problem, as defined by 
Eqs. (1-12), (1-13), can be converted into an equivalent linear programming 
problem Ax = b, x > 0, max or min z = cx, where “equivalent” means that 
both problems have the same set of optimal solutions. Note that in the new 
form the constraints are a set of simultaneous linear equations. Hint: For any 
inequality in Eq. (1-13) of the form £y=i a*yxy < 6», define a new non-negative 
variable x r +* (called a slack variable) by x r +i = 6; — ]C5=i a*yxy. What is 
the physical interpretation of a slack variable? For any inequality in Eq. (1-13) 
of the form a,yxy > b{, define a new non-negative variable x r +i (called a 
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surplus variable) by x r +i = dijXj — bi. What is the physical interpreta¬ 
tion of a surplus variable? What prices should be assigned to slack and surplus 
variables? 

6-56. Show that if the linear programming problem Ax = b, x > 0, max 
z — cx has an optimal solution, then at least one of the basic feasible solutions 
will be optimal. 


CHAPTER 7 


CHARACTERISTIC VALUE PROBLEMS 
AND QUADRATIC FORMS* 

“Siftings on siftings in oblivion , 

Till change hath broken down 
All things ...” 

Ezra Pound-Hugh Selwyn Mauberley. 

7-1 Characteristic value problems. A problem which arises frequently 
in applications of linear algebra is that of finding values of a scalar pa¬ 
rameter X for which there exist vectors x ^ 0 satisfying 

Ax = Xx, (7-1) 

where A is a given nth-order matrix. Such a problem is called a character¬ 
istic value (eigenvalue, or proper value) problem. If x ^ 0 satisfies (7-1) 
for a given X, then A operating on x yields a vector which is a scalar multi¬ 
ple of x. 

Clearly, x = 0 is one solution of (7-1) for any X; however, this trivial 
solution is not of interest. We are looking for vectors x ^ 0 which satisfy 
(7-1). Now (7-1) can be written 

Ax = Xlx, 
or 

(A - XI)x = 0. (7-2) 

If we choose a given X, then any x which satisfies (7-1) must satisfy the set 
of n homogeneous linear equations in n unknowns (7-2). There will be 
a solution x ^ Oto (7-2) if and only if 

that is, if and only if 

an — 
a 2 i 

0«i 


* This chapter is based on the assumption that the reader has studied Sec¬ 
tions 2-11, 2-12, and Section 3-16 or Problem 4-18. 
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|A - XI| = 0, 


(7-3) 


X di2 
a 2 2 — X ■ 

a n 2 


a l n 

a 2n 

®nn ^ 


= 0. 


(7-4) 
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Obviously, this determinant is a polynomial in X. The highest-order term 
in X comes from the product of the diagonal elements. Thus (—X) n is 
the highest-order term and |A — Xl| is an nth-degree polynomial. We 
can write 

/(X) = |A - XI| = (-X)" + i>„_i(-X)"- 1 + • • • + &i(-X) + b 0 . 

(7-5) 

Equation (7-4) is called the characteristic } or secular , equation* for the matrix 
A, and /(X) is called the characteristic polynomial for A. 

From the fundamental theorem of algebra we know that the nth-de¬ 
gree equation/(X) = 0 has n roots. Not all these roots need to be different, 
but if a root is counted a number of times equal to its multiplicity, there 
are n roots, which may be either real or complex numbers. It follows 
that there cannot be more than n different values of X for which 
|A - XI | = 0. For values of X different from the roots of /(X) = 0, 
the only solution to (7-1) is x = 0. If X is set equal to one of the roots 
X», then |A — X»I| = 0, and there is at least one x^O which satisfies 
(7-1). The maximum number of linearly independent vectors x which 
satisfy (7-1) when X = X t -, will be the nullity of A — X t I. The roots of the 
characteristic equation , which will he denoted by Xi, i = 1, . . ., n, are 
called the characteristic values , eigenvalues , proper values , or latent roots of 
the matrix A. The vectors x ^ 0 which satisfy (7-1) are called characteristic 
vectors or eigenvectors of the matrix A. 

We can write the polynomial /(X) in factored form, using the roots of 
/(X) = 0, that is, 

/(X) = (Xx - X)(X 2 - X) ... (X n - X). (7-6) 

Comparison of (7-6) and (7-5) yields the well-known relations between 
the roots and coefficients of a polynomial (see Problem 7-1 for details): 

bn —i = 2 X< = Xi + X 2 + * * * + X n , 

t=l 

b n — 2 = = ^1^2 + ■ * * + XiX n + X 2 X3 + * * * + X 2 X n 

3>i 

• + * * * + X n _iX n , (7-7) 

6 n _ r = ^2 \i\j * * * Xfc (each term is a product of r of the X<), 

h> 

bo = XiX 2 X3 * * * X n . 


* The name "secular equation” arose because Eq. (7-4) appears in the theory 
of secular perturbations in astronomy. 
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There are also relations between the bj and the elements of A; in par¬ 
ticular, if we set X = 0 in (7-5), we see that b 0 = |A|. The derivation of 
the other bj in terms of the is to be supplied in Problem 7-2. 

To obtain numerical values for the eigenvalues of A, it is necessary to 
solve the characteristic equation /(X) = 0. This can be a difficult under¬ 
taking, especially if the equation is of high degree (say 3 or higher). In 
fact, a great deal of work is involved even in computing the coefficients 
bj of /(X) [see Eq. (7-5)] from the a%j of A. We shall not concentrate on 
numerical methods for finding the eigenvalues of a matrix; instead we shall 
emphasize the theoretical development of the subject. 

Example: The characteristic polynomial for a second-order matrix 
A = 11 a,- ; -11 is 


/(X) = 


d\\ — X d\2 
&21 &22 — X 


( a ll X) (a 2 2 X) — ^12^21 


= (— X) 2 + fall + «22)( — X) + Q'llO'22 — 012^21 
= (—X) 2 + &i(—X) + bo ; 

hence 

b i = On + &22, b 0 = Q'llQ'22 — 012^21 — |A|. 


The two roots of /(X) = 0 are 

X = ^{fa u + a 22) ± [(an + a 22 ) 2 — 4|A|] 1/2 J. 
If Xi, X 2 denote these roots, then 

X 1 X 2 — |A| = bo, X x + X 2 = an + a 2 2 = b 1 . 


For most characteristic value problems of physical interest which have 
matrices A whose elements are real numbers, it turns out that A is also a 
symmetric matrix. The theory of eigenvalue problems involving sym¬ 
metric matrices A is thus very important. Interestingly enough, the 
theory is much simpler in this case than in that of a nonsymmetric matrix 
A. However, prior to discussing topics related to characteristic value 
problems for symmetric matrices, we shall introduce the notion of the 
similarity of matrices, which will be needed in the following sections. 


7-2 Similarity. Suppose that x is an eigenvector of A corresponding to 
the eigenvalue X and that P is an nth-order nonsingular matrix. Then 
the vector y — Px will not, in general, be an eigenvector of A correspond¬ 
ing to the eigenvalue X since on multiplying the left-hand side of (7-1) 
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by P, we obtain 


PAx = XPx, 


(7-8) 


which is not the same as APx = XPx. However, x = P 1 Px. Substitut¬ 
ing this into (7-8), we obtain 


PAP^Px - XPx, (7-9) 

or 

PAP _1 y = Xy. 


Thus y is an eigenvector of the matrix PAP” 1 , corresponding to the 
eigenvalue X. We have shown that if X is an eigenvalue of A, then X is 
also an eigenvalue of PAP” 1 for any nth-order nonsingular matrix P. If 
B = PAP” 1 , then A = P -1 BP, and x = P _1 y. This demonstrates that 
any eigenvalue of B must also be an eigenvalue of A. Hence the matrices 
A, B have identical sets of eigenvalues and are called similar matrices. 

Similarity: If there exists a nonsingular matrix P such that B = PAP” 1 , 

the square matrices A and B ore said to be similar . 

If B = PAP” 1 , we say that B is obtained by a similarity transformation* 
on A. A similarity transformation is a special case of an equivalence trans¬ 
formation, defined in Section 4-6. If B is similar to A, B is also equivalent 
to A. 

Since similar matrices have the same set of eigenvalues and the char¬ 
acteristic polynomial of a matrix can be written in the form (7-6), it fol¬ 
lows that similar matrices have the same characteristic polynomial, that 
is, for any value of X, /a(X) = /r(X), where /a(X) and /r(X) are the char¬ 
acteristic polynomials for A, B, respectively. 

7-3 Characteristic value problems for symmetric matrices. In general, 
the eigenvalues of a matrix A need not be real numbers—they may be 
complex.! We have been assuming that the elements of A are real num¬ 
bers. It does not follow that the roots of (7-4) will be real numbers since 
the roots of a polynomial equation with real coefficients may be complex. 


* Note that if B = PAP -1 , then B — R -1 AR, where R = P” 1 . Thus 
either PAP -1 or P _1 AP represents a similarity transformation on A. We shall 
call B similar to A if B = PAP -1 for some P or B = R _1 AR for some R. The 
definitions are equivalent since the substitution R = P -1 converts one form 
into the other. 

f If an eigenvalue is complex, the components of any eigenvector correspond¬ 
ing to this eigenvalue cannot all be real. The reader should note that everything 
that was stated in Sections 7-1 and 7-2 is still true since the general theory 
developed in Chapters 2 through 5 also applies to matrices with complex elements. 
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However , if A in (7-1) is a symmetric matrix , we can easily show that 

the eigenvalues of A are real. 

To carry out the proof, it is necessary to use several simple properties 
of complex numbers. (We suggest that the reader refresh his memory by 
examining the problems on complex numbers at the end of Chapters 2 
and 3.) Assume that X is a real or complex eigenvalue of the symmetric 
matrix A. Then there will be at least one eigenvector (which may have 
complex components) such that 


Ax = Xx. (7-10) 

Taking the complex conjugate of (7-10) and recalling that the elements 
of A are real, we obtain 

Ax* = X*x*, (7-11) 

where * denotes the complex conjugate, and x* = [x*, . . . , x *]. Multi¬ 
plying (7-10) by (x*)', (7-11) by x', and subtracting, we have 

(x*)'Ax - x'Ax* = (X - X*)x'x* (7-12) 

since (x*)'x = x'x*. Recall that A' = A. Furthermore, x'Ax* is a num¬ 
ber, and the transpose of a number (matrix of a single element) is itself 
the number. Thus 

x'Ax* = (x'Ax*)' = (x*)'Ax. 


In addition, x'x* = ELi z t &* is real and positive since x ^ 0. Con¬ 
sequently, X = X*, and X is real; that is, if X = a + hi , then X* = a — bi f 
and X = X* implies 6 = 0. 

Each eigenvalue of a symmetric matrix is real; hence the components 
of the eigenvectors will also be real because a set of homogeneous linear 
equations with real coefficients yields solutions x whose components are 
real. It is interesting to note that we did not have to use the character¬ 
istic equation in order to prove that its roots were real. 

When A is symmetric , another important result follows: The eigenvectors 
corresponding to different eigenvalues are orthogonal; i.e., if Xy is an eigen¬ 
vector corresponding to eigenvalue Xy, and Xy is an eigenvector correspond¬ 
ing to eigenvalue Xy (Xy Xy), then XyXy = 0. To prove this, we observe 
first that, by assumption, 


AXy — XyXy 


and 


Axy — XyXy. 


Thus 


XyAXy = X yXyX y 


XyAXy = XyXyXy 


and 


(7-13) 
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Subtracting and noting that xyAx* — xjAxy, we have 

(X* — \j)x'jXi = 0, (7-14) 

or 

XjXi = 0, since X; 5^ Xy. 

If x 0 is an eigenvector of A, then any scalar multiple of x is also 
an eigenvector of A, corresponding to the same eigenvalue. In general, 
the length of an eigenvector is not of interest. For convenience, we shall 
always assume that the eigenvectors of a symmetric matrix are of unit 
length; they will be denoted by u t . When the eigenvectors are of unit 
length, they are said to have been normalized. A set of two or more 
normalized eigenvectors u t corresponding to different eigenvalues of A 
satisfies the equation u t Uy — Any set of vectors satisfying such an 
equation is said to be orthonormal. 

Example: Find the eigenvalues and eigenvectors of 




The characteristic equation is 


|A — XI| 


2 - X 

V2 


V2 

1 - X 


= X 2 - 3X = 0, 


whence the eigenvalues are 

Xi — 0, X 2 == 3. 


To determine the eigenvectors corresponding to X», we must solve the set 
of homogeneous equations (A — X t I)x = 0. Let us take Xi = 0 first; 
then the set of equations becomes 


or 


2x x -f V2x 2 = 0, 

y/2xi + x 2 = 0, 


xi =- — x 2 . 

\/2 


If we wish to find an eigenvector of unit length, we must require that 
x \ + x\ = 1. Thus (3/2) a;! = 1; choosing the positive square root, we 
obtain 


x 2 = 



Xi = 


1 

V3 
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and the eigenvector of unit length corresponding to Xi is 



Note that Ui is not completely specified by the requirement of |ui| = 1. 
The vector —ui is also an eigenvector of length 1. However, Ui and 
—Ui are not linearly independent. Only one linearly independent eigen¬ 
vector corresponds to Xi. 

For X 2 = 3, the set of equations becomes 


—xi + y/2 x 2 = 0, V2 X\ — 2x 2 = 0, 
or 


Xi = V2 x 2 . 

If x\ + x\ = 1, then 3x§ = 1; taking the positive square root, we have 


and 




Again, there is only one linearly independent eigenvector which corre¬ 
sponds to X 2 . It can be easily checked that UgUi = 0, in agreement with 
the theory. 

7-4 Additional properties of the eigenvectors of a symmetric matrix. 

Suppose that all the eigenvalues of an nth-order symmetric matrix are 
different. Then there exists a set of n vectors u* (one for each eigen¬ 
value \i) such that 

u'-Ui = 8 {j (all i y j = 1, . .. , n), (7-15) 

where dij is the Kronecker delta. This follows because eigenvectors 
corresponding to different eigenvalues are orthogonal. Thus the 
u* (i — 1, . . . , n) form an orthonormal basis for E n . Hence, when the 
eigenvalues are all different, no more than one linearly independent eigen¬ 
vector can correspond to a given eigenvalue. If there were two linearly 
independent eigenvectors corresponding to a given eigenvalue, both would 
have to be orthogonal to n — 1 orthonormal eigenvectors belonging to 
the other eigenvalues. However, we know from Section 2-11 that if two 
nonzero vectors from E n are orthogonal to an orthonormal set of n — 1 
vectors, then one is a scalar multiple of the other, and they are not linearly 
independent. Thus a contradiction is obtained. 
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Let us now consider the case where the eigenvalues of A are not all dis¬ 
tinct. We shall show: (1) If an eigenvalue Xy of the nth-order symmetric 
matrix A has multiplicity k > 2, there exist k orthonormal (and linearly 
independent) eigenvectors with eigenvalue Xy. In fact, there exist an 
infinite number of sets of k orthonormal eigenvectors corresponding to 
Xy. (2) There cannot be more than k linearly independent eigenvectors 
with the same eigenvalue Xy; hence, if an eigenvalue has multiplicity k, the 
eigenvectors with eigenvalue Xy span a subspace of E n of dimension k. 
Then if the sets of eigenvectors corresponding to all the different eigen¬ 
values are combined, it is possible to obtain an orthonormal basis for E n . 

To prove that there exist k linearly independent eigenvectors corre¬ 
sponding to an eigenvalue Xy of multiplicity k , we must show that the 
nullity of A — Xyl is greater than, or equal to, k. To do this, we begin by 
noting that there will be at least one eigenvector with eigenvalue Xy, 
say Uy. From Section 2-11 we know that there exist n — 1 vectors 

v x , . . ., v n _ x such that the set uy, Vi, . . . , v n -i is an orthonormal basis 

for E n . Consider the matrix 

Qi = (uy, vi,..., (7-16) 

then 
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Now observe that 

Q'iQi = I or Q7 1 = Qx; (7-20) 

that is, the inverse of Qi is equal to its transpose. Hence A x is similar to 
A, and A x , A have the same set of eigenvalues. From (7-19) 

|Ai Xlnl — (Xy X)|ac M n _ 1 |. (7—21) 


Thus if Ay is an eigenvalue of A with multiplicity k > 2, it must be true 
that | a. — Ayl n __ x | = 0. Hence all minors of 


Ai Ayl n — 


0 

0 


0 

CL — Xyl n _i 


(7-22) 


of order n — 1 vanish, and the nullity of A x — Ayl is > 2. Since 
r( A — Ayl) = r(A x — Ayl), the nullity of A — Ayl is > 2. Consequently, 
there exists an eigenvector uy of A with eigenvalue Xy which is linearly 
independent of, and orthogonal to, Uy. 

If the multiplicity k = 2, our development is completed. If k > 3, the 
above procedure is repeated: There exist n — 2 vectors v x , . . . , v n _ 2 such 
that uy, Uy, v x , .. . , v n _ 2 is an orthonormal basis for E n . If Q 2 = 
(uy, fly, v x , . . . , v n _ 2 ), then 

Qi" 1 = Q' 2) (7-23) 

and 


A 2 = Q 2 AQ 2 — 


X; 

0 

0 


0 0 

Xy 0 , 

0 p_ 


(7-24) 


where p — ||v t 'Av s || is a symmetric matrix of order n — 2. Then 
|A 2 — XI| = (Xy — \) 2 \p — Al n - 2 1, and since k > 3, \fi — Xyl n _ 2 | = 0, 
and all minors of A 2 — Ayl of order n ~ 2 vanish. Hence the nullity of 
A — .Ayl is >3, and there exists an eigenvector with eigenvalue Xy which 
is orthogonal to Uy, Uy. In this way one shows that if the multiplicity of 
Xy is k, there exist at least k orthonormal eigenvectors with eigenvalue Xy. 

Now it is also true that there cannot be more than k orthonormal 
eigenvectors with eigenvalue Xy if Xy has multiplicity k. This follows be¬ 
cause each eigenvector corresponding to Xy is orthogonal to every other 
eigenvector corresponding to an eigenvalue different from Xy. The fore¬ 
going results have shown that if we sum over all different eigenvalues, we 
obtain an orthonormal set containing at least n vectors. However, there 
cannot be more than n orthonormal vectors in E n , and hence the eigen¬ 
vectors of an eigenvalue of multiplicity k span a ^-dimensional subspace 
of E n . 
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In summary, we have proved: 

(1) The eigenvectors of an nth-order symmetric matrix A span E n . 

(2) There exists at least one orthonormal set of eigenvectors of A which 
span E n . 

(3) If an eigenvalue Xy has multiplicity k, there will be exactly k 
eigenvectors with eigenvalue Xy in any set of n orthonormal eigenvectors 
of A. 

(4) If an eigenvalue Xy has multiplicity k, the eigenvectors correspond¬ 
ing to Xy span a subspace of E n ) of dimension k. 

(5) If one or more eigenvalues have multiplicity k > 2, there will be 
an infinite number of different sets of orthonormal eigenvectors of A 
which span E n , corresponding to the different ways of selecting ortho¬ 
normal sets to span the subspaces with dimension k > 2. 


Example: Find a set of three orthonormal eigenvectors for the sym¬ 
metric matrix 


A = 


3 0 
0 4 

0 Vs 


0 

V3 

6 


The characteristic polynomial is 

/(X) = (3 - X)[(4 - X)(6 - X) - 3] = (3 - X)(X 2 - 10X + 21), 
and the eigenvalues are 

X = 3 (twice), 7. 

The eigenvalue 3 has multiplicity 2. Write Xi = 7, X 2 = 3. 

There will be only one linearly independent eigenvector with eigen¬ 
value 7; it is a solution to (A — 7I)x = 0, that is, 

—4xi = 0, 

— Sx 2 + Vs x 3 = 0, 

Vs x 2 — x 3 — 0, 

whence 

xi = 0, x 2 = -7= x s . 

Vs 

If Y, x i = 1? then (4/3)x| 
obtain 

*3 


— 1; choosing the positive square root, we 


Vs 


X * =2 
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Thus 



is an eigenvector of unit length with eigenvalue 7. The only other eigen¬ 
vector of unit length with eigenvalue 7 is — u x . 

Since X 2 = 3 has multiplicity 2, there should be two orthonormal eigen¬ 
vectors with eigenvalue X 2 . In fact, there should exist an infinite number 
of sets of two orthonormal eigenvectors with eigenvalue 3. Any eigenvec¬ 
tor corresponding to X 2 must satisfy the set of equations (A — 31) x = 0, 
that is, 

x 2 + Vs x 3 = 0, \/3 x 2 + Sx 3 = 0, 

or 

x 2 = —\/S x 3 , xi arbitrary. (7-25) 

If = 1, then x\ + \x\ = 1. 

Let us choose x x — 0: Then, taking the positive square root, we obtain 

1 Vs . 

•^3 *^2 2 , 

hence 



is an eigenvector with eigenvalue X 2 . 

We now wish to find another eigenvector with eigenvalue X 2 which is 
orthogonal to u 2 . We can do this by annexing 

u' 2 x = — X 2 + 1*3 = 0 (7-26) 

to the above set of equations. This procedure will automatically ensure 
that the new solution is orthogonal to u 2 . Equation (7-26) requires that 
x 2 = ( l/VS)x 3 which, together with (7-25), implies that x 2 = x 3 = 0. 
However, we want J^xf = 1, and thus x\ — ±1. Selecting the positive 
value, we see that 

*2 = [ 1 , 0 , 0 ] 

is an eigenvector with eigenvalue X 2 which is orthogonal to u 2 . Note that 
both u 2 and u 2 are automatically orthogonal to u x ; hence u x , u 2 , fi 2 form 
an orthonormal basis for E 3 (illustrate geometrially). 

It is easy to find a different set of orthonormal eigenvectors for A 
which span E n . If we set x x = f in (7-25), then 

V3 3 

x 3 = -j-> = -j; 



7 - 5 ] 


DIAGONALIZATION OF SYMMETRIC MATRICES 


247 


and 


u 2 * 


1 _3 V3 
2’ 4’ 4 . 


is an eigenvector with eigenvalue X 2 . Now we annex 


u 2 *x = 2*1 


3 , \/3 

4*2 + - 4 - *3 


= 0 


(7-27) 


to (7-25) in order to obtain another eigenvector orthogonal to u 2 ». Sub¬ 
stituting x 2 = — V3 x 3 into (7-27), we obtain 

2*1 + V3 x 3 = 0 or xi = —2\/3 x 3 . 

Since £*? = 1, 16x§ = 1 ; choosing the positive square root, we obtain 


whence 


*3 = 4, 


x 2 = — 


V3 


*1 = — 


V3 




is an eigenvector with eigenvalue X 2 which is orthogonal to u 2 *; thus the 
set Ui, u 2 *, u 2 * is an orthonormal basis (different from Ui, u 2 , u 2 ) for E n . 


7-5 Diagonalization of symmetric matrices. Let the n eigenvalues of 
the nth-order symmetric matrix A be Xi,. . ., X». In this listing, an eigen¬ 
value is repeated a number of times equal to its multiplicity. Thus if one 
eigenvalue has multiplicity k, there will be k eigenvalues Xy with the same 
numerical value. In the last section, we saw that each Xy has a correspond¬ 
ing eigenvector Uy such that the set Ux, . . ., u n is an orthonormal basis for 
E n . There is at least one such set of uy, and there may be an infinite num¬ 
ber of different sets. Since these eigenvectors uy are orthonormal, 

UiUy = 8{j (all i,j). (7—28) 

Next, let us consider the matrix Q = (ui,. .., u n ) whose columns 
are an orthonormal set of eigenvectors for A. Matrix Q has the following 
property: 

Q'Q = IK'U,I| = ||S, 7 || =. I, (7-29) 

whence 

Q -1 = Q', (7-30) 

i.e., the inverse of Q is the transpose of Q. 

Orthogonal matrix: A matrix Q is called orthogonal if its inverse is 
its transpose f that is, Q~ l = Q'. 
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Now 

Q'AQ = KAuy|| = || XjUtix/|| = UMoll. (7-31) 

Thus Q'AQ is a diagonal matrix whose diagonal elements are the eigen¬ 
values of A. If we write 

D - 11Ay Ml (7-32) 

then 

Q _1 AQ = Q'AQ = D, (7-33) 


and A is similar to D. We say that D is obtained by performing an ortho¬ 
gonal similarity transformation on A, and that the similarity transforma¬ 
tion diagonalizes A. 

We have just proved the important result that any symmetric matrix 
A can be diagonalized by an orthogonal similarity transformation. Further¬ 
more, the matrix Q which is used to diagonalize A has as its columns an 
orthonormal set of eigenvectors for A. The resulting diagonal matrix has as 
its diagonal elements the eigenvalues of A. 


Examples: (1) For the matrix 

2 V2 


A = 


and 


Q'AQ = 


V2 


(see example p. 241), 


1 

-1 

V 2 I' 

3 

,V2 

1 J. 

1 

-1 

V 2 I' 

3 

y§ 

1 J. 


(ui, u 2 ) = — 

V3 


' 2 V2 
%/2 1 
0 3\/2 
0 3 


— 1 

V2 

-1 

V2 


V2 

1 

VT 

l 



0 

0“ 


'Xi 

o' 


0 

3. 


.0 

x 2 _ 


(2) For the symmetric matrix 


A = 


3 0 
0 4 

0 Vs 


0 

Vs 

6 


Q = (ui, u 2 , ^ 2 ) — 


(see example p. 245), 



0 1 
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Q'AQ = 


1 

2 

Vs 

2 

0 


7 0 0 
0 3 0 
0 0 3 


Vs 

2 

1 

2 


3 0 

0 4 


0 

Vs 


0 Vs 6 


1 

2 

Vs 

2 


0 

Vs 

2 

1 

2 


We leave it for the reader to show that a~ similarity transformation involv¬ 
ing the matrix Q x = (u x , u 2 *, u 2 *) also diagonalizes A. 


7-6 Characteristic value problems for nonsymmetric matrices. We 
shall not discuss in any great detail the theory of characteristic value prob¬ 
lems for nonsymmetric matrices. The theory for nonsymmetric matrices 
is not nearly so simple as that for symmetric matrices. Let us first list the 
main points of difference between eigenvalue problems of symmetric and 
nonsymmetric matrices. If the nth-order matrix is not symmetric, then: 

(1) It is not necessarily true that all the eigenvalues of A are real. 

(2) It is not necessarily true that eigenvectors corresponding to different 
eigenvalues are orthogonal. 

(3) Even if all eigenvalues are real, the eigenvectors of A may not 
span E n . 

(4) If eigenvalue Xy has multiplicity k, the nullity of A — Xyl is not 
necessarily k. 

(5) There may not exist any similarity transformation which diagonal¬ 
izes A. 

Some of the problems will deal with these differences between symmetric 
and nonsymmetric matrices. 

Although eigenvectors corresponding to different eigenvalues of a non¬ 
symmetric matrix need not be orthogonal, they are linearly independent. In 
fact, any set of eigenvectors for the square matrix A, no two of which corre¬ 
spond to the same eigenvalue , is linearly independent * The proof is made 
by assuming that such a set is linearly dependent and by obtaining a 
contradiction. Let x x ,. . ., x g be a set of eigenvectors for A such that Xy 
has eigenvalue Xy, and no two Xy are equal. Suppose that the set of Xy is 
linearly dependent, and that the matrix X = (x x , . . . , x«) has rank r < s. 


* It should be noted that this proof and the next are valid even if the eigen¬ 
values are complex. 
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Number the eigenvectors so that x 1( . . ., x r are linearly independent. 
Any Xj O' = r + 1,. . ., s) can be written 

Xj = £ aiXi, (7-34) 

1=1 

and at least one ay ^ 0 since Xy ^ 0. Then 

r 

A Xj — ^ ^ ayAxy, 
i=l 

or 

r 

XyXy = 2 ayXyXy. (7-35) 

4=1 

Multiply (7-34) by Xy and subtract the result from (7-35) to obtain 

0=2 ay(Xy — Xy)xy. (7-36) 

4=1 

However, Xy — Xy ^ 0 for j = r + 1, . . . , s, and at least one ay 0. 
Thus Eq. (7-36) indicates that x u . . . , x r are linearly dependent, which 
contradicts our original assumption. Therefore the set Xi, . , . , x, cannot 
be linearly dependent. 

Next we shall show that if an nth-order matrix A has n linearly inde¬ 
pendent eigenvectors, then there exists a similarity transformation which 
diagonalizes A. In fact, if X = (x 1? . . . , x n ) is a matrix whose columns are 
a set of n linearly independent eigenvectors, X“\AX is a diagonal matrix 
whose diagonal elements are the eigenvalues of A. 

To prove this, let Xy be an eigenvector with eigenvalue Xy (not all Xy 
are necessarily different). Also write D = ||Xy 5yy(|. Then 


and 


XD — (XiXei, X 2 X© 2 ? . • • j X n Xe n ) — (XiXi, X 2 X 2 , • • • > X n x n ), 

(7-37) 


AX — (Ax 1 , . . . , A Xfi) — (X 1 X 1 , . . ., XftX w ). 


Hence 

or 


AX = XD, 
D = X -1 AX. 


(7-38) 


Thus A is similar to the diagonal matrix D whose diagonal elements are 
the eigenvalues of A. If A is not symmetric, the matrix X is not in general 
an orthogonal matrix. 
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The above result shows that if the eigenvalues of A are all different, A 
can always be diagonalized by a similarity transformation. If the eigen¬ 
values of A are not all different, A can be diagonalized if it has n linearly 
independent eigenvectors. If A does not have n linearly independent eigen¬ 
vectors, A cannot be diagonalized by a similarity transformation (proof?). 
However, by a similarity transformation, any square matrix A can always 
be converted into a matrix with the following properties: 

( 1 ) All elements below the main diagonal vanish. 

(2) The elements on the main diagonal are the eigenvalues of A, and 
equal eigenvalues appear in adjacent positions on the diagonal. 

(3) The only elements above the main diagonal which do not vanish 
are those whose column index j is equal to i + 1 , where i is the row index. 
Any such nonvanishing element has the value unity. However, it can 
have the value unity only if the diagonal elements in positions i and z + 1 
are equal. Thus a 5th-order nonsymmetric matrix with = X 2 = X 3 , 
X 4 = X 5 , X 4 9 ^ X 3 could be reduced to the unique form 

Xi ft 0 0 O' 

0 X 2 02 0 0 

0 0 X 3 0 0 > 

0 0 0 X 4 ft 

„ 0 0 0 0 x 5 _ 

where the value of the ft is either 0 or 1 . This is called the Jordan canoni¬ 
cal form for the matrix A. We shall not attempt to prove the existence of 
a similarity transformation which will reduce a matrix to its Jordan 
canonical form. 

7-7 Quadratic forms. The techniques of linear algebra are often useful 
in dealing with nonlinear expressions, such as, e.g., quadratic forms. 
A quadratic form in n variables x u . . ., x n is an expression 

w n 

F = ^2 dijXiXj — di\X\Xi + d\2X\X2 + * • • + d\ n X\X n 

»=1 1 

“I - d2\X2%\ + * * * + d2nX2Xn ~h * * * “f~ d n \X n Xi -f- • • • -f- d nn X n X n 

(7-39) 

which is a numerical function of the n variables. Equation (7-39) de¬ 
termines a unique value of F for any set of Xj. It is called a “quadratic” 
form because each term dijXiXj contains either the square of a variable 
or the product of two different variables. Quadratic forms are important 
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in (1) deriving sufficient conditions for maxima and minima in analysis; 
(2) quadratic programming; (3) approximating functions of n variables in 
the neighborhood of some point (this technique is used in statistics, for 
example, in response-surface analysis). 

If we write x = [x u • • ■ , x n], A = ||a;y||, Eq. (7-39) can be expressed 
in the form of 

F — 2 x * S ai > Xj = S x *( Ax )» =° x 'Ax; (7-40) 

i=l j—1 

i.e., if we use matrix notation, a quadratic form can be written x'Ax, 
and A is said to be the matrix associated with the form. 

It should be noted that a;y, ay; [Eq. (7-39)] are both coefficients of x&j 
when i 9* j (since X{Xj — XjXi ), that is, the coefficient of x%Xj is a;y + 
ay; (i j). If a*y 5 ^ ay;, we can uniquely define new coefficients 

bij = bji = 0/13 (all i,y), (7-41) 

so that 6;y + bji = a;y + ay;, and B = ||6;y|| = B'; hence B is a sym¬ 
metric matrix. This redefinition of the coefficients does not change the 
value F for any x. Thus we can always assume that the matrix A associ¬ 
ated with the quadratic form x'Ax [Eq. (7-40)] is symmetric; if it is not, 
(7-41) can be used to convert it into a symmetric matrix. Note that the 
element a;y of A is the coefficient of x;xy in (7-39), so that if any a;y = 0 , 
the corresponding product of the variables XiXj does not appear in the 
quadratic form. 

Examples: (1) A quadratic form in one variable is the expression ax \. 
(2) The most general quadratic form in two variables is 


d\\x\ + 2ai 2 ZiX 2 0>22 x 2y 
in matrix form, this can be written 


(3) The matrix 



associated with the quadratic form 

4x\ + % x i x 2 + 4x2Xi + 6xl — 4 x\ + 6x x x 2 + 6x| 
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is not symmetric. However, without changing the value of the form, we 
can write a\ 2 = a 2 1 = 3 and obtain the symmetric matrix 

4 3" 

3 6_ 

7-8 Change of variables. It is often possible to simplify a quadratic 
form x'Ax by a change of variables x = Ry or y = R -1 x, where R is, of 
course, a nonsingular matrix. We shall restrict ourselves to nonsingular 
transformations because these alone are one-to-one transformations; that 
is, a given x determines a unique y, and a given y determines a unique x. 
Such transformations are invertible, and we can go from x to y or from 
y to x. Substitution of x = Ry into F = x'Ax gives 

F = (Ry)'ARy = y'R'ARy = y'By, (7-42) 

where B = R'AR. In terms of the new variables y, the form x'Ax be¬ 
comes y'By, and B = R'AR. Note that if A is a symmetric matrix, B 
is also symmetric. 

Congruence: A square matrix B is said to be congruent to the square 

matrix A if there exists a nonsingular matrix R such that B = R'AR. 

If B is congruent to A, then we say that B can be obtained by a congruence 
transformation* on A. A congruence transformation is a special case of 
an equivalence transformation; i.e., if B is congruent to A, B is equivalent 
to A. The matrix B of the quadratic form y'By obtained by the non¬ 
singular transformation of the variables x = Ry in the form x'Ax is 
congruent to A. 

The determinant |A| is called the discriminant of the quadratic form 
x'Ax. If B = R'AR is congruent to A, then the discriminant of the form 
y'By is 

|B| = |R'| |A| |R| = |R| 2 |A|; 

that is, under a nonsingular change of the variables x = Ry, the dis¬ 
criminant of the new quadratic form assumes a magnitude of |R| 2 times 
that of the original form. The determinant |R| is sometimes called the 
modulus of the transformation x = Ry. Note that the modulus of the 
transformation y = R _1 x is the reciprocal of that of x = Ry since 
RR -1 = I, and hence |R -1 | = 1 /|R|. 


* Note that if B = R'AR, then B = SAS', where S = R'. Thus either 
R'AR or RAR' can be used to define congruence and a congruence transforma¬ 
tion. 
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If we allow x to vary over all of E n , then the set of values taken on by 
F = x'Ax is called the range of the quadratic form. Under a nonsingular 
transformation of variables , the range of a quadratic form remains unchanged. 
To prove this statement, let us suppose that we have the form x'Ax and 
make the change of variables x = Ry or y = R —X x to obtain the new 
form y'R'ARy = y'By. Now it is only necessary to note that for any x 
there is a unique y (and, similarly, for any y there is a unique x) such that 

F = x'Ax = y'By. (7-43) 

Hence x'Ax and y'By must have the same range. In general, this property 
will not hold if the matrix R is singular. For example, if R = 0, the range 
of y'By contains only the single number 0. 

7-9 Definite quadratic forms. Some quadratic forms have the property 
x'Ax > 0 for all x except x = 0; some are negative for all x except x = 0; 
and some can assume both positive and negative values. We introduce 
the following definitions: 

Positive definite quadratic form: The quadratic form x'Ax is said 
to be positive definite if it is positive (>0) for every x except x = 0. 

Positive semidefinite quadratic form: The quadratic form x'Ax is 
said to be positive semidefinite if it is non-negative (>0) for every x, and 
there exist points x ^ 0 for which x'Ax = 0. 

Negative definite and semidefinite forms are defined by interchanging the 
words “negative” and “positive” in the above definitions. If x'Ax is posi¬ 
tive definite (semidefinite), then x'(—A)x is negative definite (semi¬ 
definite) . 

Indefinite forms: A quadratic form x'Ax is said to be indefinite if the 
form is positive for some points x and negative for others. 

A symmetric matrix A is often said to be positive definite, positive semi¬ 
definite, negative definite, etc., if the respective quadratic form x'Ax is 
positive definite, positive semidefinite, negative definite, etc. 

Examples: (1) F = Sx\ + 5x|, F = 2x\ + 3x\ + x\, F — x\ are 
positive definite forms in two, three, and one variable, respectively. 

(2) F = 4xf + x\ — 4x x x 2 + 3z| = (2xi — x 2 ) 2 + 3 x\ is positive 
semidefinite since it is never negative and vanishes if x 2 = 2x x , x 3 = 0. 

(3) F = —2x x — x 2 , F = —x\ — x\, F = —x\ are negative definite 
forms in two, two, and one variable, respectively. 

(4) F = 4x\ — 3 x 2 2 is indefinite since it is positive when x\ — 1, 
x 2 = 1 and negative when x\ = 0, x 2 — 1. 
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A positive ( negative ) definite form remains positive {negative) definite 
when expressed in terms of a new set of variables provided the transformation 
of the variables is nonsingular. Thus if x'Ax is positive (negative) definite 
and R is nonsingular, then y'R'ARy = y'By (x = Ry) is positive (nega¬ 
tive) definite. The proof is simple: Since we know that the range of y'By 
is the same as that of x'Ax, it is only necessary to show that y = 0 is the 
only y for which y'By =0. Now the form x'Ax = 0 only if x = 0. 
However, y = R -1 x and hence y = 0 is the only value of y for which 
x = 0. Of course, semidefinite and indefinite forms remain semidefinite 
and indefinite, respectively, under a nonsingular transformation of vari¬ 
ables. 

7-10 Diagonalization of quadratic forms. Given a quadratic form x'Ax, 
let us consider the nonsingular transformation of variables x = Qy, 
where the columns of matrix Q are an orthonormal set of eigenvectors 
for A. The matrix Q is therefore an orthogonal matrix, and the trans¬ 
formation of variables is called an orthogonal transformation. In terms of 
the variables y, the quadratic form becomes 

y'Q'AQy = y'By, 

and D = ||Xy 5,y|| is a diagonal matrix whose diagonal elements are the 
eigenvalues of A, in agreement with Section 7-5, where we showed* that 
D = Q'AQ. Thus 

y'Dy = \ jy 2 j. (7-44) 

3 = 1 

Only the squares of the variables appear; there are no cross products 
ViVi (i 7* j)- 

A quadratic form containing only the squares of the variables is said to 
be in diagonal form. Furthermore, we say that the transformation of 
variables x = Qy has diagonalized the quadratic form x'Ax. A quadratic 
form will be in diagonal form if the matrix associated with the form is a 
diagonal matrix. We have proved that by an orthogonal transformation of 
variables every quadratic form x'Ax may be reduced to a diagonal form (7-44). 
Furthermore , in the transformation of variables x — Qy, the matrix Q has 
cls its columns a set of orthonormal eigenvectors of A which span E n . In the 
diagonal form (7-44), the coefficient of yf is the eigenvalue hj of A. It should 
be noted at this point that it does not automatically follow that if a 
quadratic form x'Ax has been reduced to diagonal form by a change of 


* If Q is an orthogonal matrix, a similarity transformation Q X AQ is also a 
congruence transformation Q'AQ. 
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variables X = Ry, the coefficients of the yf are the eigenvalues of A. 
Later a transformation of variables in which R is not orthogonal will be 
introduced which will diagonalize the form; however, the coefficients of 
the y\ will not, in general, be the eigenvalues of A. 

If we know the eigenvalues of A, we can immediately determine whether 
the form x'Ax is positive definite, indefinite, etc. We can do this because: 
(a) the range of a form is unchanged under a nonsingular transformation 
of variables; (b) a positive or negative definite form remains positive or 
negative definite under a nonsingular transformation of variables; (c) the 
transformation x = Qy discussed above reduces x'Ax to the diagonal 
form (7-44). 

If each eigenvalue of A is positive (negative), then the only value of y 
for which (7-44) vanishes is y = 0. Hence (7^44) is positive (negative) 
definite and by (b) x'Ax is positive or negative definite. Suppose that all 
the eigenvalues of A are non-negative (nonpositive), but one or more of 
the eigenvalues are zero, say X n = 0. Then (7-44) will always be non¬ 
negative (nonpositive). However, if we set y x = y 2 = • • • — y n - i = 0, 
Eq. (7-44) vanishes for any value of y n \ hence there exist y ^ 0 for which 
(7-44) is zero. For any y ^ 0, there is an x = Qy ^ 0 such that 
x'Ax = 0, and by (a) the form x'Ax is positive (negative) semidefinite. 
If A has both positive and negative eigenvalues, (7-44) is indefinite and 
by (a) x'Ax is indefinite. These results show that: 

(1) x'Ax is positive {negative) definite if and only if every eigenvalue of A 
is positive (negative). 

(2) x'Ax is positive (negative) semidefinite if and only if all eigenvalues of 
A are non-negative ( nonpositive ), and at least one of the eigenvalues vanishes. 

(3) x'Ax is indefinite if and only if A has both positive and negative 
eigenvalues. 

Example: Consider the quadratic form 

F = 2xl + 2y/2 xix 2 + x 2 = x'Ax; (7-45) 


the symmetric matrix A is then 


' 2 V2 

V2 1 


To diagonalize the form, we use the transformation of variables x = Qy, 
where the columns of Q are the orthonormal eigenvectors of A. The 
eigenvectors of A (Section 7-3, p. 242) are 
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Thus 


Q = 


V3 


-1 

V2 


V2l 

f 

1 , 


and the transformation of variables is 


Xi = (—yi + V2 y 2 ), Vi 

Vs 

*2 = - 7 = (\/2 + 2 / 2 ), 2/2 

V3 


— 7 = (—Xi + V^a: 2 ), 
V3 

- 7 : (y/2 Xi + x 2 ). 

Vs 


(7-46) 


Note that Q” 1 = Q'; hence it is very easy to find the inverse transforma¬ 
tion for an orthogonal transformation of variables. Since the eigenvalues 
of A are Xi = 0, X 2 = 3, the form F becomes F = 3y% under this trans¬ 
formation of variables. The form is therefore positive semidefinite. The 
point y = [2, 0] causes the form to vanish. The x corresponding to 
this y is x = [—2/\/3, 2\/|], and of course, (7-45) vanishes for this value 
of x. It is suggested that the reader introduce the new variables into 
(7-45) by direct substitution, and show that F reduces to F = 3 y\. 


7-11 Diagonalization by completion of the square. Another procedure 
for diagonalizing quadratic forms is a generalization of the familiar tech¬ 
nique of completing the square, learned in elementary algebra. Consider 
the quadratic form in two variables 


F — d\\x\ 2ai2^i^2 4" ^ 22 ^ 2 * (7—47) 

If either an or a 22 is not zero, we can assume without loss of generality 
that an is not zero. Then (7-47) can be written 


F = Oh 


2 1 2 ai 2 , 

x i + —— x i%2 + 

an 


( ^11 


\ a i 


) 2 /n \ 2 

^,2 _ I a 12 A 

2 \«11/ 


X 2 + 


#22 J2 

an 


xij 




If we introduce the transformation of variables, 
+ or y = Sx = 

2/2 = x 2 , 


a 12 ' 

an 


x, 


(7-48) becomes 


F = an2/i + 


h - 


(7-49) 


(7-50) 
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and the form (7-47) has been diagonalized. The transformation of vari¬ 
ables is nonsingular (|S| = 1), but it is not orthogonal. The coefficients 
of y\ } y\ in (7-50) are not, in general, the eigenvalues of A. 

In the event that an, a 2 2 both vanish, the above procedure will not 
work. When an — a 22 = 0, (7-47) becomes 

F = 2a l2 x 1 x 2 . (7-51) 

Now make the transformation 


*1 = 2/1 + V 2 , 
*2 = Vi — V2 f 



This is a nonsingular transformation which reduces (7-51) to 

F = 2012 ( 2 /? — 2/1). 

Hence in this case also the form has been diagonalized. The procedure 
just outlined can be generalized to diagonalize any quadratic form. 

In the text, we shall discuss only reductions for positive definite and 
negative definite forms, while the generalization to arbitrary forms will 
be the subject of some of the problems. Let 

F = x'Ax = ^2 a ii x i x i (7-52) 

be a positive definite quadratic form. The terms in (7-52) involving X\ 
are 

flu*? + 2ai2^i^2 + * • * + 2 a\ n X\X n . (7-53) 


Since the form is positive definite, it must be positive when x 2 = 
xz = • • • == x n — 0 and x\ 0. Then F — aux\ r and an must be 
positive. Hence (7-53) can be written 



— an *i + 2 ^2 x i x k 

ifri «n 




= an 




2 



(7-54) 


and the following transformation of variables suggests itself: 

n 

1 V'' 01fc 

t»l = Xi + 2_, — Xk, v 2 = X 2 , . . . ,J>„ = x„, 

k=2 ai1 


(7-55) 
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or v = Six, where 


Si = 


1 


a 12 


an 
0 1 


®ln' 

an 

0 


and |Si| = 1. 


.0 0-1 

Thus a nonsingular transformation of variables has reduced F to 

n 

F = a u »? + Y! bijV&j, 

ij=2 


(7-56) 


and an > 0. Furthermore, (7-56) is positive definite. 

This procedure is now repeated. Since (7-56) is positive definite, 
b 22 > 9* We complete the square for the variable v 2 , define another 
transformation of variables, 


bo k 

w 1 = v u W 2 = v 2 -h 2 ^ V k, = *> 3 , 

k=3 ° 22 

or w = S 2 v, IS 2 I = 1, and obtain the form 


>w n = v n 


F = anwj + 622^2 + ^ CijWjWj 

i,3 =3 


(7-57) 


(7-58) 


with an, 5 2 2 > 0- Repeating this process n — 1 times, we arrive at 

F — any 1 + b 22 y 2 + c^y\ + * • • + z nn y %; (7-59) 

the coefficient of each yf is positive. The last step yields two square terms, 
and all cross products disappear. The nonsingular transformation y = Sx 
which reduces (7-52) to (7-59) is 


S — S n _iS„_ 


2 * 


S2S1, 


(7-60) 


and |S| = 1 since each |S»| = 1 . 

The same procedure can be used to diagonalize a negative definite form. 
It should be noted that S is a triangular matrix, i.e., all elements below 
the main diagonal vanish; hence S is not orthogonal. The elements an, 
b 22 , etc., in (7-59) are not, in general, the eigenvalues of A. 


7-12 Another set of necessary and sufficient conditions for positive 
and negative definite forms. Since we often wish to establish whether a 
quadratic form x'Ax is positive definite without determining the eigen¬ 
values of A, we shall now develop another set of conditions that will en- 
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able us to make such a decision. If the order n of A is large, these condi¬ 
tions are not easily applicable to practical problems, but they are useful 
in theoretical work. 

A set of necessary and sufficient conditions for the form x'Ax to be positive 
definite is 


> 0 ; 

|011 

ai2 | > 0 ; 


1021 

022 ! 


an 

a 2 i 

031 


012 

a 22 

032 


013 

023 

033 


> 0 ; 


; |A| > 0. 
(7-61) 


If these n minors of A are positive, x'Ax is positive definite; and x'Ax is 
positive definite only if these minors are positive. 

To prove the necessity, assume that x'Ax is positive definite. Then 
there exists a nonsingular transformation (see Section 7-11) y = Sx or 
x = Ry, S = R _1 with modulus unity which reduces the form to 

y'Dy = ^ d{yl (d,- >0, % = 1,.. ., n). (7-62) 

i—1 

However, D = R'AR, and since |R| = 1/|S| = 1, 

|D| = d x d 2 * • • d n — |R| 2 |A| = |A|; (7-63) 

thus |A| > 0. 

Next we set x n = 0. The resulting form in n — 1 variables is also posi¬ 
tive definite. The matrix of its coefficients is obtained by crossing off the 
last row and column of A. If this form is diagonalized by the method 
described in Section 7-11, we obtain £?=i where the di are, in fact, 
the same as in (7-62), the only difference being that the term d n y\ does 
not appear. Thus 


0ii * * * 0i,«—i 


— d\d 2 • • • d n _i > 0 . 


0 n — 1,1 * * * 0 n — 1 , n — 1 


(7-64) 


If we now set x n = # n _i = 0 in the original form, then y n = y n -i = 0 
in (7-62); hence 


0ii * • • 0i,n—2 

a^ _ 2,1 * * * a n _ 2 ,n —2 


did 2 * * * d n — 2 > 0 . 


(7-65) 


Continuing in this way, we see that if x'Ax is positive definite, Eq. (7-61) 
holds. 
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To prove the sufficiency, let us suppose that (7-61) holds for the form 
x'Ax. We wish to show that x'Ax is positive definite. Since an > 0, 
we can perform a nonsingular transformation x — RiV with unit modulus, 
of the type discussed in Section 7-11, and obtain 

an^i + ^2 (7-66) 

t\y=2 


To demonstrate that b 2 2 > 0, we begin by noting that the coefficients 
are independent of the values of the variables Vi (or x t ). Thus, if we set 
Xi = V{ = 0 (i = 3,. . ., n), (7-66) becomes a form in two variables 
whose discriminant is an& 2 2 - When the above variables are set to zero, 
the original form x'Ax reduces to a form in two variables whose discrimi¬ 
nant is 

|an ai 2 | ^ q 

|a2i a2 2 l 


The discriminants are equal since the modulus of the transformation is 
unity. Thus au& 22 > 0, and £>22 > 0 since an > 0. 

Another nonsingular transformation with unit modulus reduces the 
form (7-66) to 

n 

anwj + b 22 w\ + X CijWjWj. (7-67) 

i,j =3 


Setting Wi = = Xi — 0 (i = 

Thus 


1 ^ 22^33 — 


4,.. 


an 

a 2 i 

a3i 


, n), we can see that c 33 is positive. 


ai2 

ai3 

a 2 2 

a 2 3 

a 32 

«33 


> 0 , 


and c 33 > 0 since an, 6 22 > 0. This process can be continued until we 
obtain 

X) d(y 2 i (di> 0, i = 1,.. ., n). (7-68) 

*= 1 

This form is clearly positive definite. Hence the original form is also posi¬ 
tive definite because it may be obtained from (7-68) by a nonsingular 
transformation of variables. 

Equation (7-61) represents only one set of necessary and sufficient 
conditions for a positive definite form. The variables do not have any 
specific property that made us call x\ the first variable (it could have been 
x 7 or any other variable); that is, we can renumber the variables in any 
way we choose. Thus the determinants formed by permutations of the 
subscripts will also serve as a set of necessary and sufficient conditions. 
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In general, 


<Lkk > 0 ; 


<*fc& O'kj q 
djk CLjj 


a>kk dkj 

djk djj 
dik dij 


dki 

dji 

da 


> 0 ; 


. . ., |A| > 0, 
(7-69) 


where (k, j, i . . .) is any permutation of the set of integers (1, 2, ... , n), 
represents a set of necessary and sufficient conditions ensuring that 
x'Ax will be positive definite. Note that the permutation of subscripts 
never affects the sign of |A| since both rows and columns of A are inter¬ 
changed in the process. 

The determinants in (7-61) are found from A as follows: 


aii_J 

<*12 j 

<*13!* * 

* <*ln 

a2i 

<*22 ! 

<*23!' * 

• a 2n 

<*31 

<*32 

<*33j* • 

<*3n 

<*» 1 

<*n2 

<*n3 ‘ * 

<*nn 


These determinants are called the naturally ordered principal minors of A. 
A necessdry dnd sufficient condition for the form x'Ax or the symmetric 
mdtrix A to be positive definite is thdt the noturdily ordered principal minors 
of A ore dll positive. It should be observed that if A is positive definite, 
|A| > 0, and hence A is nonsingular. 

Using the criteria for positive definite forms, we can easily derive a set 
of necessary and sufficient conditions ensuring that a form x'Ax or the 
symmetric matrix A will be negative definite. If x'Ax is negative definite, 
then x'(—A)x is positive definite; furthermore, we recall that | —A[ — 
(—l) n |A|. A set of necessary and sufficient conditions for x'Ax to be 
negative definite, or equivalently, for x'(—A)x to be positive definite then 
follows immediately from (7-61), i.e., 


|<*11 

fll2 | > 0; 

1 <*21 

<*22! 


dn 

<*12 

<*13 



<*21 

<*22 

<*23 

< 0; .. 

. ; (—1)"|A| > 0, 

<*31 

<*32 

<*33 


(7-70) 


an < 0; 


where the a i; - are the elements of A {not —A). 

Example: The form Sx\ + Axix^ + 2x\, which can be written 


is positive definite since 


(Pi, x 2 ) 


3 

2 



<*11 

<*12 

_ 3 

2 

<*21 

<*22 

2 

2 


an = 3 > 0; 


= 2 > 0. 
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If we introduce the transformation of variables (see Section 7-11), 

Vl = *1 + f* 2 , 2/2 = *2 or *1 = 2/1 — §2/2, X 2 = V 2 , 

the form becomes 

3(2/1 — §^ 2 ) 2 + 4(j/x — §2/2)2/2 + 22/1 

= 32/i — 42/i2/2 + §2/1 + 42 / 12/2 — §2/1 + 2 2/i 
= 32/f + §2/1- 

The form has been reduced to a sum of squares whose coefficients are all 
positive. 

7-13 Simultaneous diagonalization of two quadratic forms. The analysis 
of certain problems in mechanics and economics can be considerably 
simplified by introducing a nonsingular transformation of variables which 
will simultaneously diagonalize two quadratic forms. In general, one of 
the two forms will be positive definite (or negative definite), and for these 
two cases it is indeed possible to find such a nonsingular transformation 
of variables. We shall now show how this can be done. 

Let F i = x'Ax be a positive definite quadratic form in n variables, 
and F 2 = x'Bx any other quadratic form in n variables. We wish to find 
a nonsingular transformation x = Ry which simultaneously diagonalizes 
both forms. Let Qi be an orthogonal matrix whose columns are an ortho¬ 
normal set of eigenvectors for A. Introducing the transformation of 
variables x = QiW, we find 

Fi = w'Q'iAQiW = w'Dw — Xjwj, (7-71) 

i-i 

F 2 = w'QjBQiW = w'Bw, (7-72) 

where D = ||Ay S,-y|| (the Ay being the eigenvalues of A) and B = QjBQj. 
Since F i is positive definite, each \j > 0. 

Next we introduce the nonsingular (but not orthogonal) transformation 
of variables 

Wj = \\j\~ ll2 Zj or w = Hz, where H = || |Xy|“ 1/2 6 t -y|j. (7-73) 

Then (7-71) and (7-72) become 

Fi = z'H'DHz = z'lz = 2 z l (7-74) 

3=1 

F 2 = z'H'BHz = z'Cz, where C = H'BH. (7-75) 



264 


CHARACTERISTIC VALUES; QUADRATIC FORMS 


[chap. 7 


Finally, we construct an orthogonal matrix Q 2 whose columns are an 
orthonormal set of eigenvectors for C and introduce the transformation of 
variables z — Q 2 y. In terms of the variables y, (7-74) and (7-75) become 

Fi = y'Q^iQar = y'Q'sChy = y'ly = (QiQ« = J ) ( 7 ~ 76 ) 

and 

= y'Q' 2 CQ 2 y = y'Dy, where D = ||X y 5*y[|; (7-77) 

the are the eigenvalues of C. Thus both forms have been diagonalized 
by the nonsingular transformation of variables x — Ry, where 

R = QiHQ 2 . (7-78) 

This transformation is not orthogonal, and hence the congruence trans¬ 
formation which diagonalizes A, B is not a similarity transformation. 
This transformation which diagonalizes both matrices reduces A to the 
identity matrix, not to a matrix with the eigenvalues of A as the diagonal 
elements. 

If A were negative definite instead of positive definite, the same trans¬ 
formation of variables would reduce A to —I. It is important to note why 
a transformation which simultaneously diagonalizes both forms can always 
be found only if one of the forms is positive (negative) definite. The key to 
a simultaneous diagonalization is to find a congruence transformation 
which will reduce A to an identity matrix. If, after introducing the trans¬ 
formation x = QiW, we were to perform the transformation w = Q 3 y, 
where Q 3 contained an orthonormal set of eigenvectors for B, F x may not 
be in diagonal form because Q 3 DQ 3 may not be a diagonal matrix. If A 
is not positive (negative) definite, we cannot make a transformation of 
the type (7-73), which reduces A to the identity matrix. For example, 
if A is indefinite and each Xy 5 * 0, the transformation (7-73) reduces A 
to a diagonal matrix G with diagonal elements ±1, and with at least 
one element being —1. However, QaGQ 2 may not be diagonal. If a Xy = 0, 
we cannot make the transformation (7-73) for Wj. Again, it will be im¬ 
possible to convert A to an identity matrix; hence if the transformation 
diagonalizing F 2 is introduced, F x will not remain diagonal. 

7-14 Geometric interpretation; coordinates and bases. Sections 7-14 
through 7-16 will be devoted to the geometric interpretation of a number 
of concepts discussed in the present chapter. We shall begin by extending 
the notion of generalized coordinate systems, introduced in Chapter 2. 
We noted there that any set of basis vectors for E n can be thought of as 
defining a coordinate system for E n . If a x , . . . , a n is a basis for E n , then 


7-14] GEOMETRIC INTERPRETATION; COORDINATES AND BASES 


265 


any vector x in E n can be written as a linear combination of the basis 
vectors 

x = E <*&, (7-79) 

y=i 

and the ay are called the coordinates of x with respect to the coordinate 
system defined by the basis vectors ai, . . . , a n . If the ay are mutually 
orthogonal, they define an orthogonal coordinate system. 

Let us consider another basis, bi, . . ., b n , for E n . This basis also 
defines a coordinate system for E n y and any x in E n can be written 

x = £ fijbi, (7-80) 

y-i 

where the 0y are the coordinates of x with respect to the coordinate system 
defined by the basis vectors by. Now it is possible to write any by of the 
second basis as a linear combination of ai, . . ., a n , that is, 

by = Sij&i (j = 1,..., n); (7-81) 

1=1 

and if 


A — (aj, . . •, a n ), B — (bj, .. ., b n ), S — 11 j 11 — ( s i> • • • > s n)> 

(7-82) 

then 

B = AS = (A Sl , . . ., As n ). (7-83) 


Equation (7-83) tells us how the two sets of basis vectors are related. 

Next we wish to relate the coordinates 0y of x relative to the coordinate 
system defined by the by to the coordinates ay of the coordinate system 
defined by the ay. We write 


oc = [a ly . . . , a n ], P = [ 01 , . . . , 0 nl- 


Then 


x = A a — B0; 


using (7-83), we obtain 

x = A ol = AS0, 


or, since A and S are nonsingular (why?), 

0 = S —1 a. 


(7-84) 


(7-85) 


Equation (7-85) gives the relation between the coordinates in the two 
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coordinate systems, and (7-83) expresses the relation between the basis 
vectors for the two systems. If we write (7-83) as 

B' = S'A', (7-86) 

it follows that (7-86) can be obtained from (7-85) by replacing P by B', 
S” 1 by S', and a. by A'. Because of the relation between (7-85) and (7-86) 
it is often stated that the matrix giving the transformation of the coordi¬ 
nates is the reciprocal transpose of that giving the transformation of the 
basis vectors. Thus we have developed the general equations describing 
the change from one coordinate system to another in E n . In terms of the 
notation of Chapter 2, we would write x a = ce, x b = p since a can be 
thought of as the representation of x in the coordinate system defined by 
the ay, etc., for p. 

Equation (7-85) suggests that for any nth-order nonsingular matrix R, 
the transformation y = Rx can be imagined to relate the coordinates of a 
given vector in two different coordinate systems for E n . In one coordinate 
system, the vector can be represented by x (for example, in the orthogonal 
coordinate system defined by the unit vectors ey), and in the other co¬ 
ordinate system, the vector is represented by y. If A contains as columns 
the basis vectors for the coordinate system where the vector is repre¬ 
sented by x, then the matrix B whose columns are the basis vectors for 
the coordinate system where the vector is represented by y is given by 
B = AR- 1 . 

This interpretation of y = Rx differs from that in Section 4-1. There 
we suggested that y and x could be considered to be different points in E n 
referred to the same coordinate system; that is, the coordinate system re¬ 
mains fixed but the vectors change. Thus y = Rx has a dual interpreta¬ 
tion: It can be thought of as relating the coordinates of the same point 



Figure 7-1 
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(vector) in two different coordinate systems or as moving the point x 
into another point y, both points being referred to the same coordinate 
system. The change of coordinate interpretation is called the alias in¬ 
terpretation, while the interpretation of having the point move else¬ 
where is called the alibi. 

Example: The unit vectors ei = [1, 0], e 2 = [0,1] define an orthogonal 
coordinate system in E 2 with coordinates x 1} x 2 . Any vector x can be 
written x = x x e x + x 2 e 2 . The set of basis vectors ai = [1, £], a 2 = [J, 2] 
also defines a coordinate system for E 2 (see Fig. 7-1) which is not orthog¬ 
onal since ai and a 2 are not orthogonal. Such a coordinate system is 
often called oblique. 

Let ai, a 2 be the coordinates for x in the coordinate system defined by 
ai,a 2 , that is, x = a x a x + a 2 a 2 . The matrix whose columns are ei, e 2 is 
the identity matrix I. If A = (a x , a 2 ), then the matrix S which relates 
the two sets of basis vectors is 



since A = IS = S. The matrix relating cl — [a Xi a 2 ] and x = [x Xi x 2 ] 
is S” 1 , that is, cl = S _1 x. Thus 

H-4 2 _i lH or «■-«*■-*** (7-87) 

U2J L —i 1 JU2J «2 = — -&X1 + ftx 2 . 

Consider the vector x whose representation in the system defined by 
ei, e 2 is x = [1, 1]. The coordinates of x in the system defined by aj, a 2 
are [from (7-87)] a x = 14/15, a 2 = 4/15. Note that since ai does not have 
unit length, a x is not the distance measured along the a 1 -axis from the 
origin to the point where a line drawn through x parallel to a 2 intersects 
the «i-axis. This distance is |aiai| which is oq|ai|, because in our 
example, a x is positive. Since |ai| = (l/2)\/5, the distance ' is 
(14/15)[(1/2)V5] = (7/15)V5. 

7-15 Equivalence and similarity. Let us consider a linear transforma¬ 
tion T which maps points x in E n into points y in E m so that y = T(x). 
The problems of Chapter 4 show that for fixed bases (i.e., given coordi¬ 
nate systems) in E n and E m , there exists a unique m X n matrix A such 
that y = Ax. The matrix A is the representation of the linear transforma¬ 
tion T with respect to the given coordinate systems.* 


* Here x, y refer not only to points in E n and E m , respectively, but also to 
the representation of these points in the coordinate systems for which the 
representation of T is A. 
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Now, suppose that we introduce new coordinate systems into E n and 
E m . Let the new coordinates, £, in E n be related to the original ones, x, 
by x = F£, and the new coordinates, y, in E m to the original ones, y, by 
y = Gy. Then in terms of the new coordinates, the transformation 
y = Ax becomes 

Gy = AF£ or y = G _1 AF£ = Bi, (7-88) 

where B = G -1 AF. The matrix B is the representation of the linear 
transformation T in the new coordinate systems and is equivalent to the 
matrix A. 

The preceding discussion shows that equivalent matrices can be thought 
of as representing the same linear transformation in various coordinate 
systems. In fact, if A represents the linear transformation T for given co¬ 
ordinate systems in E n and E m , and if B is any other matrix equivalent to 
A, then there exist a coordinate system in E n and a coordinate system in 
E m such that B is the representation of T for these coordinate systems. 

Next let us concentrate our attention on the special case where m = n, 
so that T maps points x in E n into points y in E n . Assume that the points 
x, y are referred to the same coordinate system in E n . Then for this co¬ 
ordinate system, there exists a unique matrix A such that y = Ax. If a 
new coordinate system is introduced into E n and the new coordinates, 
±, y, are related to the old ones by x = S£, y = Sy (note that the same S 
appears in both equations, since both sets of vectors are referred to the 
same coordinate system), then y — Ax becomes 

Sy = ASS or y = S^ASi = Bx. (7-89) 

Thus the matrix B = S'" 1 AS which represents T in the new coordinate 
system is similar to the matrix A which represented T in the old coordinate 
system. If T “looked like” A in the original coordinate system, it “looks 
like” S _1 AS in the new coordinate system. 

Note: Similarity transformations are of frequent occurrence in physics. 
Consider an anisotropic dielectric (nonconducting) material. If an electric 
field is imposed on this material, there is a separation of charge, and the 
material becomes polarized. The electric field f and the polarization p 
are both vector quantities. Because the material is not isotropic, the direc¬ 
tion of the polarization vector will not, in general, lie along the same line 
as the electric field vector. The vector p will be related to f by an equation 
of the form p = Ef, where E is a third-order symmetric matrix, called the 
dielectric tensor. The matrix E depends on the coordinate system to which 
the vectors p, f are referred. Suppose that p = Ef when p, f are referred 
to a coordinate system with coordinates X\, x ^, x%. Now a transformation 
is made to a new coordinate system with coordinates y lt 2/2 , Vs, and x = 
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Sy. Then in terms of this new coordinate system, the dielectric tensor is 
E = S -1 ES, and p = Ef when p, f are referred to the y-coordinate 
system. 

7-16 Rotation of coordinates; orthogonal transformations. Orthogonal 
coordinate systems are probably used more frequently in practice than 
any other type. By “rotating” such an orthogonal coordinate system it 
is often possible to simplify the equations of interest. Let us study the 
rotation of orthogonal coordinates in E 2 . Consider Fig. 7-2. Imagine 
that we begin with the £ ^-coordinate system. Another 2 / 1 ?/ 2 -coordinate 
system is obtained by rotating the x i^-system through the angle 0, as 
shown. The matrix relating the ^^-coordinates to the £i ^-coordinates 
will now be found. The vectors which define the £ ^-coordinate system 
are ei = [1,0], e 2 = [0,1]. The orthonormal vectors defining the yin¬ 
coordinate system will be denoted by €j, c 2 . Any vector v can be written 


v = + x 2 e 2 = 2/ici + y 2 e 2 . 


The component of £iei along the n-axis is X\ cos 0, and the com¬ 
ponent of*£ 2 e 2 along the 2 / 1 -axis is x 2 sin 0. The sum of these components 
must be 2 /i. Hence 


Similarly, 


?/i = Xi cos 0 + x 2 sin 6. 
y 2 = —Xi sin 8 + x 2 cos 8. 


The transformation of coordinates in matrix form can be written y = Q*, 
where 


Q = 


cos 6 
—sin 8 


sin 8 
cos 6 


(7-90) 


The matrix Q is orthogonal since Q'Q = 1 (recall that sin 2 8 + cos 2 8=1). 



Figure 7-2 
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The basis vectors «i, e 2 for the rotated coordinate system are related 
to ©i, e 2 by («i, c 2 ) = (©i, e 2 )Q' because Q -1 = Q'. Thus (« lf c 2 ) = Q' 
or «i = [cos 0, sin 0], € 2 = [—sin 0, cos 0]. 

The notion of rotating orthogonal coordinates can be generalized to E n . 
First, however, we shall develop a few more properties of orthogonal 
matrices: An orthogonal matrix has been defined as one whose inverse is 
its transpose; that is, Q is orthogonal if Q~ x = Q'. If we denote the 
columns of Q by qy, then since Q'Q — I, it follows that 

qWy = Su, (7-91) 

and the columns of Q form an orthonormal basis for E n . If the rows of Q 
are denoted by q\ then since QQ' = I, 

q y (q7 = *a, (7-92) 

and the rows of Q also form an orthonormal basis for E n . Any matrix 
whose columns are an orthonormal set of vectors is an orthogonal matrix, 
and we have just shown that the rows therefore are also an orthonormal 
set of vectors. Note that if Q is orthogonal, Q' and Q~ l are also orthogonal. 
From Q'Q = I, it follows that 

IQ'Q! = IQ'I |Q| = IQI 2 = 1 or |Q| = ±1. (7-93) 

The determinant of an orthogonal matrix can have only the values ±1. 

If Qi and Q 2 are nth-order orthogonal matrices, then Q 1 Q 2 is also an 
orthogonal matrix since 

(QiQ 2 )'QiQ 2 - Q 2 Q 1 Q 1 Q 2 = Q 2 IQ 2 = Q 2 Q 2 = I, (7-94) 

and hence 

(Q 1 Q 2)- 1 = (Q 1 Q 2 )'. 

It will be noted that for the Q of (7-90), |Q| = 1. This Q provided the 
transformation of coordinates on rotation of axes through an angle 0 in E 2 . 
An interesting geometrical interpretation can also be given to orthogonal 
matrices with |Q| = —1. Consider the orthogonal coordinate system 
defined by ei, e 2 , with coordinates x\, x 2 . We introduce a transformation 
of coordinates y = Qx, where 



Q ih orthogonal and |Q| = —1. Thus y\ = X\, and i/ 2 — ~x 2 . The 
new basis vectors ci, c 2 are €1 = ©j, c 2 = —e 2 . The coordinate systems 
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2/2 


Figure 7-3 

are shown in Fig. 7-3. We cannot obtain the new coordinate system by 
rotating the coordinate system defined by ei, e 2 ; instead we reflect the 
x 2 -axis in the origin, that is, we replace x 2 by y 2 = —x 2 . Problem 7-57 will 
ask the reader to show that any second-order orthogonal matrix can be 
written as (7-90) or as 

cos 0 sin 01 
—sin 0 cos 0J 

so that if |Q| = 1, y = Qx can be interpreted as a rotation of axes, and 
if |Q| = —1, y = Qx can be interpreted as a rotation plus a reflection. 
The latter is sometimes referred to as an improper rotation. 

Consider the transformation 0 = Qa in E n , where Q is an orthogonal 
matrix. This can be interpreted as a change of coordinates. If the ct are 
coordinates relative to an orthonormal basis a x ,. .., a„, then A = 
(*i, • • •, a n ) is an orthogonal matrix. The basis vectors for the system 
whose coordinates are 0 will be denoted by b lf . . ., b n . Then if B = 
(hi, ..., b n ), B = AQ', and B is an orthogonal matrix since the product 
of orthogonal matrices is an orthogonal matrix. Thus hi,...., b n form 
an orthonormal set and define an orthogonal coordinate system. In the 
coordinate system defined by the ay, the length of any vector x = 
is the square root of x'x = £>*<xya-ay = a) = ol'cl. In the coordi¬ 
nate system defined by the by, the length x = £?-.i 0yby is then the square 
root of £y=i 0y = 0'0. Thus * 

0'0 = ^2 Pj = ^2 OCj = ol'cl. 
y=i 1 
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This relation also follows directly from p = Qa since 

P'P — a'Q'Qa — a'la — a a. (7-95) 

The above discussion suggests that any transformation p = Qa (Q 
orthogonal) can be considered to give the transformation of coordinates 
on rotation of an orthogonal coordinate system in E n . The rotation will 
be either proper or improper, depending on whether |Q| = 1 or |Q| = —1. 

We have seen that any quadratic form x'Ax can be diagonalized by a 
transformation of variables x = Qy or y = Q'x, where Q is an orthogonal 
matrix whose columns are an orthonormal set of eigenvectors for A. 
This transformation of variables can be interpreted geometrically as a 
rotation of axes. The points x may be thought of as being referred to the 
orthogonal coordinate system defined by the unit vectors ©i,. . ., e n . 
The orthonormal basis vectors u x , . . . , u n which define the coordinate sys¬ 
tem for the coordinates y are obtained from (ui, . . . , u n ) = IQ = Q. 
Hence the u ; are the set of orthonormal eigenvectors for A. A set of ortho¬ 
normal eigenvectors for A defines a coordinate system in which x^Ax is 
diagonal. The u, are said to define the principal axes for the quadratic 
form. 
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Problems 


7-1. Consider the polynomial 


/(X) > (—X) n + b n —1( X ) n 1 + • - • + 6i(—X) + 6o. 

By induction or by some different method, prove that 

n 

b n —l = X< = Xl + X2 + * * ' + x„, 

i=l 

b n — 2 = X»Xy = X1X2 H-K XlX n + X2X3 -)-■•• + X2X n 4 " * * * X n — lX„, 

3 >* 

6„_ r = E X t X, • • • Xfc, (each term a product of r of the X*-) 

k> •••>;>» 


&0 = X1X2 * * • X„_iX n , 

where the X* are the n roots of /(X) = 0. 
7-2. Consider the equation 


|A - XI| - (—X) n + bn- 1(—X)-- 1 + •. * + 6i(—X) + bo. 
By induction or some different method prove that 

n 

bn— i = E) u*» = an + «22 + • • ■ + a n „, 


bn —2 = E "" 
a » 


b n -r = sum of n\/r\{n — r)! 


principal minors of rth order which 
preserve the natural column order, 


bo = |A|. 


7-3. Find the eigenvalues and a set of orthonormal eigenvectors for the matrix 
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7-4. Find the eigenvalues and two different orthonormal sets of eigenvectors 


for the matrix 


A = 


7 0 0 
0 3 0 
0 0 3 


7-5. Find the eigenvalues and a set of orthonormal eigenvectors for 


A = 


3 0 0 
0 2 5 
0 5 4 


7-6. For the matrix of Problem 7-5, form the matrix Q whose columns are 
a set of orthonormal eigenvectors for A. Show that Q'AQ is a diagonal matrix 
whose diagonal elements are the eigenvalues of A. 

7-7. Prove that if all eigenvalues of an nth-order symmetric matrix A are 
different from zero, the rank of A is n. Prove that if 0 is an eigenvalue of A of 
multiplicity k, then r(A) = n — k. 

7-8. Show that it is impossible for any 2X2 symmetric matrix of the form 



(b * 0 ) 


to have two identical eigenvalues. 

7-9. In Problem 2-18 it was shown that if we have two subspaces Si, Si! of 
E n having dimensions d', d", respectively, and if Sh fl Si! = {0}, then d, the 
dimension of Sn + Si !, is given by d = d! + d". Show that under these assump¬ 
tions, any vector a in Sh + Si! can be written uniquely as a = ui + U2, 
ui, G Si, U 2 G SH. Then ui, 112 are called the projections of a on the subspaces 
Sn, Sn, respectively, and Sn + Si! is called the direct sum of Sn and S". Illustrate 
this geometrically when Sn is the subspace generated by [1, 2], and Si! the sub¬ 
space generated by [—1, 3]. What is the projection of [3, 3] on Si and on Si !? 

More generally, if we have k subspaces of E n , S£\ ..., Si?\ and Si fl Si! = 
{0}, where Si is any partial sum of the SiP, and Si! is any partial sum of the 
Sip containing a set of Sip different from those in Si, then S n — SiP is 

called the direct sum of the Sip. This condition requires that 0 is the only vector 
common to any different partial sums of the Sip. In particular, this condition 
requires that S^ n Sip — {0}, i ^ j, S^ + Sip fl S^ = {0}, i X J X k, etc. 

Show that if a is any vector in S n , it can be written uniquely as a — u i> 

u j is called the projection of a on the subspace SiP • The notion of a 
direct sum is a generalization of the basis concept. Give a geometrical illustra¬ 
tion of a direct sum in E z . 

Let SiP be the subspace generated by the eigenvectors for the nth-order sym¬ 
metric matrix A corresponding to the eigenvalue X,. Show that E n is the direct 
sum of these subspaces. 
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7-10. Prove that if Qi,..., Q« are orthogonal, then 


Q = 


Qi 0 -0 

0 Q 2 • • • 0 

Lo 0 •• Q„J 


is also orthogonal. 

7-11. Compute the number of independent elements in an nth-order orthog¬ 
onal matrix. 

7-12. Consider the nth-order symmetric matrix A. Let Xi be an eigenvalue 
of A, and ui an eigenvector with eigenvalue Xi. Show that there exist vectors 

vi, .. ., v n _l such that Qi = (ui, vi, . . ., v n -i) is an orthogonal matrix. Then 

show that 

giAQ, -[ x ; ;,]■ 


where Ai is a symmetric matrix of order n — 1. Show that the remaining n — 1 
eigenvalues of A are the n — 1 eigenvalues of Ai. Let X 2 be any eigenvalue of 
Ai (and hence of A). Assume that 62 is an eigenvector of Ai with eigenvalue X 2 . 
Consider P = ((h, ti, ..., 2 ), where $ 1 , . . ., t n -2 are chosen such that P 

is orthogonal. Prove that 


P'AiP = 



where A 2 is a symmetric matrix of order n — 2. Now form the matrix 



Show that Q 2 is orthogonal and that 




"Xx 

0 

0 ’ 

[ X1 

°1q2 = 

0 

X 2 

0 

Lo 

aJ 

0 

0 

A 2 _ 


Next prove by induction or some different method that there exists an orthog¬ 
onal matrix Q such that Q'AQ is a diagonal matrix with the eigenvalues of A 
as its diagonal elements. Many texts use this method of proving that A is 
similar to a diagonal matrix. Does this method of proof demonstrate that (1) Q 
contains an orthonormal set of eigenvectors for A; (2) if an eigenvalue has 
multiplicity k, then the eigenvectors of A span a ^-dimensional subspace of A? 
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7-13. Prove that if A is a symmetric matrix and Q an orthogonal matrix 
such that D = Q'AQ is diagonal, then Q must have as its columns a set of 
orthonormal eigenvectors of A. Prove also that the diagonal elements of D must 
be the eigenvalues of A. 

7-14. Prove that two commuting nth-order symmetric matrices A, B are 
simultaneously diagonizable by an orthogonal similarity transformation, that 
is, there exists an orthogonal matrix Q spch that Q'AQ and Q'BQ are diagonal. 
Hint: Let Xi, . . ., X r denote the different eigenvalues of A and assume that Xy 
has multiplicity nz>. Then there exists an orthogonal matrix Qi such that 


Di - QiAQi « 


Xii mi o • 

0 X2lm 2 * 


0 

0 


0 * X r I w 


Show that Di commutes with Q(BQi and prove that QjBQi must have the 
form (see Problem 3-11) 


QiBQi - 


Bi 0 • • • 0 
0 B 2 ••• 0 

0 0 • • B r 


Show that each B* is symmetric. Let P* be an orthogonal matrix such that 
PjBiPi is diagonal. Then consider the matrix 



Show that Q = Q 1 Q 2 is orthogonal and simultaneously diagonalizes A and B. 
7-15. Consider the matrices 


"3 

4 

0 

0* 


“l 

0 

0 

0" 

4 

2 

0 

0 

, B = 

0 

1 

0 

0 

0 

0 

1 

0 


0 

0 

-3 

2 

_0 

0 

0 

1 


_0 

0 

2 

5 


A = 
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Show that A, B commute. Find the matrix Q such that Q'AQ, Q'BQ are diagonal. 

7-16. Let A be an nth-order nonsingular symmetric matrix and u an eigen¬ 
vector of A with eigenvalue X. Show that u is an eigenvector of A*" 1 with eigen¬ 
value 1/X, i.e., if X is an eigenvalue of A, then 1/X is an eigenvalue of A -1 . 
Furthermore A, A -1 have the same set of eigenvectors. Can A have an eigen¬ 
value 0? 

7-17. Show that the eigenvalues of the transpose of the square (not neces¬ 
sarily symmetric) matrix A are the same as those of A. Are the eigenvectors of A' 
the same as those of A? 

7-18. Demonstrate that if x satisfies Ax = Xx, then A n x = X n x, so that if 
X is an eigenvalue of A, then X n is an eigenvalue of A n ; furthermore, A, A n have 
the same set of eigenvectors. Show directly that if Q'AQ = D is diagonal, then 
Q'A n Q = D”. 

7-19. Prove that if x satisfies Ax = Xx and P( A) is a matrix polynomial in 
A, then P(A)x = P(X)x. 

7-20. Show that if x is an eigenvector of A with eigenvalue X< and y is an 
eigenvector of A' with eigenvalue Xy(X* ^ Xy), then y'x = 0. Note that A does 
not need to be symmetric. 

7-21. Consider the matrix 



Find the eigenvalues of A and a set of eigenvectors for A and A'. Show that 
in this special case the results of Problem 7-20 hold. 

7-22. If Xi is the largest eigenvalue of the symmetric matrix A, prove that 

\ xAx 

Xi = max —— 

x XX 

when x is allowed to range over all of E n . Hint: Express x as a linear combina¬ 
tion of the eigenvectors of A. 

7-23. Consider the symmetric matrix A. Let Xo be any vector in E n . Com¬ 
pute xi = Ax 0 , X 2 = Axi, etc., x n ^ Ax n _i. If |Xi| is the “largest” eigenvalue 
of A and has multiplicity 1 (is npt repeated), then show that as n —> <*>, x n 
becomes proportional to XJui, where ui is the eigenvector of A with eigenvalue 
Xi, provided that xo is not orthogonal to ui. How can this result be used to com¬ 
pute numerically the largest eigenvalue of A and its corresponding eigenvector? 
Hint: Write xo as a linear combination of the eigenvectors of A. 

7-24. In Problem 7-23, show that if the multiplicity of Xi is greater than 
unity, the process can lead to one eigenvector with eigenvalue X. 

7-25. Compute the exact value for the largest eigenvalue of 



and its eigenvector. Using the technique described in Problem 7-23, try to 
determine approximately the eigenvalue and eigenvector; start with xq = [1, 0]. 
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7-26. Consider the inhomogeneous eigenvalue problem 

Ax — Xx = b (A symmetric, b 0). 

Show that there is a solution x provided that X is not one of the eigenvalues Xy 
of A. Show that any solution x can be written 




where the uy form an orthonormal set of eigenvectors of A. Hint: Write x = £<*yUy 
and evaluate the ay. The problem is called inhomogeneous because ax is not a 
solution if x is for all scalars a. 

7-27. In Problem 7-26, let 


= ' 41 l, b = [ 2 '- 

.1 3j [l. 


Write x as a function of X. 

7-28. Find the eigenvalues and eigenvectors of unit length for 


A - 


2 4 

3 1 


Show that the eigenvectors are linearly independent, but not orthogonal. 

7-29. For the matrix A of Problem 7-28, find a matrix P such that P"" 1 AP 
is a diagonal matrix whose diagonal elements are the eigenvalues of A. Carry 
out the multiplication to show that P“ 1 AP is diagonal. 

7-30. Find the eigenvalues and eigenvectors of 


A J 2 - f - 
.1 0 . 


In this case, the two eigenvalues are equal, but there is only a single linearly 
independent eigenvector. The nullity of A — XI is not the multiplicity of the 
eigenvalue X. 

7-31. By working directly with |A — XI|, prove that similar matrices A, B 
have the same characteristic polynomial. 

7-32. Show that the symmetric matrix A is positive definite if and only if 
there exists a nonsingular matrix P such that A = P'P. What is P? Hint: See 
Section 7-13. 

7-33. Write the following matrix in the form A = P'P: 


A=[ 3 
.1 2 


7-34. Show that if the symmetric matrix A is positive (negative) semidefinite, 
then |A| = 0, and A is singular. 
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7-35. Write the following quadratic forms in simplified matrix notation 
x'Ax, with A being a symmetric matrix: 

(a) 3xf -|- 2xiX2 “I - 4 x 2 ; 

(b) 3xf + x| + 5xf + 4 xiX 2 + 2 x 1 x 3 + 6 x 2 x 3 ; 

(c) 4xi + 3x1. 

Check to determine whether each of the forms is positive definite. 

7-36. Prove that a nonsingular transformation of variables can reduce any 
quadratic form x'Ax not identically zero to a form with the leading coefficient 
an ^ 0. Hint: Examine the two-variable case in Section 7-11. 

7-37. Using the results of Problem 7-36, prove that a nonsingular trans¬ 
formation of variables can reduce any quadratic form x'Ax to X)7>yf, where 
each 7 j can be positive, negative, or zero. 

7-38. Prove in detail that the nonsingular transformation introduced in 
Section 7-11 to diagonalize positive definite forms is such that S is a triangular 
matrix. 

7-39. Find a triangular matrix R which diagonalizes 
F — 9xf + 2 x 1 x 2 + 2x|. 


Determine the resulting diagonal form of F. 

7-40. Find an orthogonal transformation of variables which diagonalizes F 
of Problem 7-39. Determine the diagonal form of F under this transformation. 
7-41. Find the transformation of variables which diagonalizes 

F — 4x? — 2x1 + x§ — 2 x 1 x 2 + 4xiX3 — 5 x 2 x 3 , 


using the technique of completing the square developed in Problem 7-37. 
What is the diagonal form of F? 

7-42. Prove that if x'Ax is positive definite and A -1 exists, then x'A _1 x is 
also positive definite. Hint: Consider the transformation x = A -1 y. 

7-43. Given the form F — anx f + 2 ai 2 XiX 2 + 022X2. For what values of 
an, ai 2 , 022 , and F does this form describe a circle, ellipse, hyperbola, a pair of 
lines, a point, or all of E 2 ? 

7-44. According to Section 7-11, there exists a nonsingular, nonorthogonal 
transformation of variables which reduces a positive definite form x'Ax to a 
sum of squares . By induction or by any other method prove that 




an 

ai2 

ai3 

|an 

ai2| 

a2i 

022 

023 

U 21 

022 ! 

a3i 

a32 

033 


an 1 013 an ai2 * 

a2i a22 


Otn 


|A[ 

an • • • ai, n _i 

a n — 1,1 • * * a n _i , n —1 


ai = an; 
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7-45. Show that if any quadratic form x'Ax is diagonalized to by a 

nonsingular transformation of variables, the number of positive and negative 7 / 
is uniquely determined and is independent of the particular transformation 
which diagonalized the form. This result is called Sylvester’s law of inertia. 
Hint: Assume the contrary! Suppose that there exist two nonsingular trans¬ 
formations x = Riy, x = R 2 Z such that we obtain 

<*12/1 +-b a P yl — «p+i 2 /p+i — • • • — a r yr (otj > 0 for all;) 

and 

Pizl 4-b p q zl — 0 5 +iz«+i — • • • — p r z 2 r (Pi > 0 for all i). 

(Why can we assume that r is the same in both cases?). Assume that q < p. 
Set Zi - 0 (i = 1, ..., q) and yj = 0 (j = p + 1, ..., n). Then q + n — 
p < n homogeneous linear equations restrict the values of the x*. There exists 
a solution x ^ 0. However, the first form must be > 0, and the second < 0. 
Hence the form x'Ax must vanish from any such x, and a\y\ + * * * + «p 2 /| = 0. 
This implies y = 0, which in turn implies x = Riy = 0, and this contradicts 
the fact that there was a solution x ^ 0 . Fill in the details. Note that accord¬ 
ing to this theorem all congruence transformations P'AP (P nonsingular) which 
diagonalize the symmetric matrix A give the same number of positive and 
negative diagonal elements. 

7-46. Find a nonsingular transformation of variables which simultaneously 
diagonalizes 



Give the diagonal form of A and B. 

7-47. Show that the eigenvalues of the matrix C in Eq. 7-75 are the roots of 
|B — XA| = 0. 

7-48. A bilinear form in the variable^ yi, . . ., y n and x\ 7 . . ., x n is defined 
to be the expression 

n n 

X) E a 'iy< x i = y'Ax. 

*=i >-i 

Show that it no longer follows that A can be assumed to be symmetric. 
Furthermore, show that by the nonsingular transformation of variables y = Riv, 
x — R 2 U, the bilinear form can be reduced to v'u if A is nonsingular. What hap¬ 
pens when A is singular? 

7-49. A nonhomogeneous quadratic function of n variables xi, . .., x n is 
defined to be the expression 

n n n 

F = EE dijXiXj + bjXj + c = x'Ax + bx + c, 

1=1 ;=1 j—1 

where A = |[a»j[|, b = ( 61 , ..., 6 n )- Consider a change of variables x = y + r, 
where r = [n, . . . , r„], that is, a translation. Express the form F in terms of 
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the variables y and show that the translation leaves the matrix A unchanged. 
Why is the function called nonhomogeneous? 

7-50. For the nonhomogeneous function defined in Problem 7-49, show that 
by an affine transformation of variables, x = Qy + r, with Q orthogonal, F 
can be reduced to one of the following forms: 


F = Xiyf + • * * + \rVr + hy r + 1 ( h > 0) 

or 

F — \iyi + • • * + X r ?/? + g, 

where the are the eigenvalues of A, and each X* 7* 0 (the eigenvalues are 
numbered in such a way that the nonzero values appear first). Hint: Apply 
x = Qiz to diagonalize x'Ax, and complete the square to eliminate the linear 
terms for nonzero X*. Next take care of the remaining linear terms. Note that h 
is needed to make the final transformation orthogonal. If we wish to obtain an 
orthogonal transformation of variables x = Qy which will take fx into dy\ 
(d > 0), then fQy = dy\. What is the vector fQ? What is d? Does this yield 
the first column of Q? How can we obtain the remaining columns of Q? 

7-51. Reduce the following inhomogeneous quadratic function to one of the 
forms discussed in Problem 7-50: 


F — 2x? — 4xiX2 — + 3xi -|- 4x2 7. 


7-52. If F i is a positive definite quadratic form in the variables xi, . . ., x* 
and F 2 is a positive definite quadratic form in the variables x* + i, . . ., x n , 
consider the quadratic form F = Fi -{- F 2 - If Ai, A 2 are the matrices associated 
with Fit F 2 , respectively, what is the matrix A associated with F? Prove that 
F is positive definite. 

7-53. Given any square matrix A, show that if |a«| > |a,y| for each i, 

then A is nonsingular. Thus show that if X is any eigenvalue of A, then 
|«»i — X| < |a*y| for one or more i. Thus, bounds on the eigenvalues of A 

can be found. Use this procedure to determine bounds on the eigenvalues of 



1 5 
-2 4 
4 9 


Hint: To prove the first part of the problem, assume that Ax = 0 has a solution 
x ^ 0. Let Xi be the largest component of x in absolute value. Consider the ith 
equation. Is a contradiction obtained? 

7-54. A linear transformation T which takes vectors x in E n into vectors y 
in E n , y = T(x), is called orthogonal if the scalar product is preserved, that is, 
y'y = x'x. Show that the matrix Q representing T is orthogonal if y, x are re¬ 
ferred to orthonormal bases (orthogonal coordinates) in E n . 



282 


CHARACTERISTIC VALUES; QUADRATIC FORMS 


[chap. 7 


7-55. It was shown in the text that y = Q*> where 



can be interpreted as a rotation of an orthogonal coordinate system through an 
angle 0. Give the alibi interpretation of y = Qx and show that it rotates 
vectors through the angle — 0 . 

7-56. Show that for some 0, any orthogonal matrix Q of order 2 with |Q| = 1 
can be written in the form given in Problem 7-55. 

7-57. Show that any orthogonal matrix Q of order 2 with |Q| = —1 can 
be written 



"l 

0 " 

cos 0 

sin 0 


<?2 = 

1 . 

0 " 

cos 0 

Qi = 

_0 

—1 

sin 0 

cos 0 _ 

or 

0 

1 _ 

—sin 0 



What is the geometrical interpretation of Qi and Q 2 ? Note that if in Q 2 , 0 is 
replaced by 0 + it, Qi is obtained. Thus Q 2 does not really differ from Qx. 
Illustrate this graphically. 

7-58. Consider the oblique coordinate system for E 2 determined by the basis 
vectors ai, a 2 , where 7 is the angle between ai, a 2 , and assume that this co¬ 
ordinate system is rotated through the angle 0. Denote the new coordinates 
by 0i, 02 * Show that 


«i 

1 [sin (7 — 6 ) 

—sin 0 

ft' 

_<* 2 _ 

sin 7 

Lsm 0 

sin (7 + 0)_ 

- 02 - 


Is the matrix representing this transformation an orthogonal matrix? 

7-59. Suppose that we begin with an orthogonal coordinate system whose 
coordinates are x\, X2, 23 . Now we shall perform three counterclockwise rota¬ 
tions: ( 1 ) through an angle 0 about the 23 -axis to yield a new set of coordinates 
2 / 1 , 2/2, 2/3 = 23 ; ( 2 ) through an angle <t> about the 2 / 1 -axis to yield the set of co¬ 
ordinates z\ = 2 / 1 ^ 2 , 23; and (3) through an angle f about the 23 -axis to yield 
the set of coordinates v\, V 2 , t >3 = 23 . Find the matrix which relates the co¬ 
ordinates vi t V 2 , V 3 to 21 , 22 , 23 , and show that it is orthogonal. The angles 
0 , 4 >, f are called eulerian angles; they are of considerable use in rigid body 
mechanics. 

7-60. Consider a rectangular coordinate system with coordinates 21 , 22 , 23 . 
A counterclockwise rotation through an angle 0 about the 23 -axis yields a new 
set of coordinates yi, 2/2, 2/3 — 23 . The new coordinates are related to the 
original set by the matrix Qi. Find Qi. Next, a counterclockwise rotation 
through an angle <£ about the 2 / 2 -axis gives the coordinates z\ f Z 2 — y 2 f 23 . 
The 2 -coordinates are related to the 2 /-coordinates by the matrix Q 2 . Find Q 2 . 
Show that Q 1 Q 2 ^ Q 2 Q 1 . What does this mean geometrically? 
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7-61. A dyadic <B of nth order is defined as the generalized quadratic form 

n 

(B = ^ 

*./-i 

where the e» are the unit vectors for E n . The combination e*e> does not denote 
a scalar product. The two unit vectors are merely written side-by-side, and the 
notation e«ey does not imply any additional significance. However, it is not 
true that e^ej = eye,, that is, the order of the subscripts is important. For this 
reason, the matrix B = ||6*,|| can no longer be assumed to be a symmetric 
matrix. The left scalar product of a column vector v and the dyadic (B is defined 
as 

n n 

v • <B = ^ bip&j, 

*.i=l i , J—l 

where (v'ey) is the scalar product of v and ey. Thus the scalar product v • (B 
is a vector. The right scalar product is defined to be 

n n 

«• V = ^2 bijfiiie'jV) = ^2 bijVjCi. 

».i=l 1 

Under what conditions is v • (B = (B • v? Show that Vi • (B * V 2 is uniquely de¬ 
fined. What is x • <B • x? Evaluate e< * <B, <B • ey, ey • (B • v, e< • <B • ey, v • <B • ey. 

7-62. Show that there is a complete equivalence between dyadics and matrices 
for the operations defined in Problem 7-61. That is, show that <B is completely 
described by B and that 

v • <B —► v'B, or <B • v —> Bv, or v • <B • v —> VBv. 

7-63. Plot F — 3x? + 2a:xX2 + 4x1 for several different values of F. Find 
the principal axes for this quadratic form and illustrate geometrically. 


Problems Involving Complex Numbers 

7-64. Show that if A is an nth-order matrix with real elements, there does 
not exist a matrix P (with real or complex elements) such that P -1 AP is diagonal 
unless A has n linearly independent eigenvectors. 

7-65. Find the eigenvectors and eigenvalues of the matrix 



7-66. Show that if A is an nth-order matrix with real elements and there 
exists a matrix P such that D — P _1 AP is diagonal, then P contains as columns 
a set of n linearly independent eigenvectors of A, and the diagonal elements of 
I) are the eigenvalues of A. 
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7-67. Let Q be an orthogonal matrix with real elements. Show that if X is 
an eigenvalue of Q, so is 1/X. 

7-68. The eigenvalues of an orthogonal matrix with real elements can be 
complex. Show that the absolute value of any eigenvalue is unity, that is, 
X*X = 1, and demonstrate that there exists a real 6 such that X — e* 6 = 
cos 0 + i sin 0. Hint: Qx = Xx and (x*)'Q' = X*(x*)'. 

7-69. If X is an eigenvalue of the square matrix A with real elements, show that 
X* is also an eigenvalue of A. What is the relation between the corresponding 
eigenvectors? 

7-70. If Q is a real 3X3 orthogonal matrix (|Q| = 1), show that one of its 
eigenvalues is X = 1. This result has an important implication for the theory 
of mechanics. It means that in any arbitrary combination of rotations of a 
rigid body with one point fixed in space, one vector remains unaltered. We 
can then accomplish all rotations by a single rotation, using the unchanged vec¬ 
tor as the axis of rotation. This is known as Euler’s theorem. 

7-71. Prove that the eigenvalues of a Hermitian matrix are real (see Problem 
3-83 for the definition of a Hermitian matrix). 

7-72. Prove that the eigenvectors corresponding to different eigenvalues of a 
Hermitian matrix are orthogonal in the sense that the Hermitian scalar product 
vanishes. 

7-73. Prove that if H is a Hermitian matrix, then there exists at least one set 
of n eigenvectors uy which satisfies 


— 8ij (all t, j). 


Show that if U — (ui, . . ., u«), then D = U _1 HU is a diagonal matrix, its di¬ 
agonal elements being the eigenvalues of H. Show that U = U -1 . Such a matrix 
is called a unitary matrix. Thus any Hermitian matrix can be diagonalized by a 
unitary similarity transformation. For matrices with complex elements, unitary 
matrices play a role similar to that played by orthogonal matrices for matrices 
whose elements are real. 

7 -74. Find the eigenvalues and an orthogonal set of eigenvectors of unit 
length for 


H = 



4 + 1 
3 


Let U be a matrix whose columns are the eigenvectors of H. Show that D = 
U _1 HU is diagonal, and that the diagonal elements of D are the eigenvalues of H. 
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